public inbox for [email protected]
help / color / mirror / Atom feedRe: generic plans and "initial" pruning
66+ messages / 10 participants
[nested] [flat]
* Re: generic plans and "initial" pruning
@ 2024-12-04 17:20 ` Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Tomas Vondra @ 2024-12-04 17:20 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On 12/4/24 14:34, Amit Langote wrote:
> Hi Tomas,
>
> On Mon, Dec 2, 2024 at 3:36 AM Tomas Vondra <[email protected]> wrote:
>> Hi,
>>
>> I took a look at this patch, mostly to familiarize myself with the
>> pruning etc. I have a bunch of comments, but all of that is minor,
>> perhaps even nitpicking - with prior feedback from David, Tom and
>> Robert, I can't really compete with that.
>
> Thanks for looking at this. These are helpful.
>
>> FWIW the patch needs a rebase, there's a minor bitrot - but it was
>> simply enough to fix for a review / testing.
>>
>>
>> 0001
>> ----
>>
>> 1) But if we don't expect this error to actually happen, do we really
>> need to make it ereport()? Maybe it should be plain elog(). I mean, it's
>> "can't happen" and thus doesn't need translations etc.
>>
>> if (!bms_equal(relids, pruneinfo->relids))
>> ereport(ERROR,
>> errcode(ERRCODE_INTERNAL_ERROR),
>> errmsg_internal("mismatching PartitionPruneInfo found at
>> part_prune_index %d",
>> part_prune_index),
>> errdetail_internal("plan node relids %s, pruneinfo
>> relids %s",
>> bmsToString(relids),
>> bmsToString(pruneinfo->relids)));
>
> I'm fine with elog() here even if it causes the message to be longer:
>
> elog(ERROR, "mismatching PartitionPruneInfo found at part_prune_index
> %d (plan node relids %s, pruneinfo relids %s)
>
I'm not forcing you to do elog, if you think ereport() is better. I'm
only asking because AFAIK the "policy" is that ereport is for cases that
think can happen (and thus get translated), while elog(ERROR) is for
cases that we believe shouldn't happen.
So every time I see "ereport" I ask myself "how could this happen" which
doesn't seem to be the case here.
>> Perhaps it should even be an assert?
>
> I am not sure about that. Having a message handy might be good if a
> user ends up hitting this case for whatever reason, like trying to run
> a corrupted plan.
>
I'm a bit skeptical about this, TBH. If we assume the plan is
"corrupted", why should we notice in this particular place? I mean, it
could be corrupted in a million different ways, and the chance that it
got through all the earlier steps is like 1 in a 1.000.000.
>> 2) unnecessary newline added to execPartition.h
>
> Perhaps you meant "removed". Fixed.
>
Yes, sorry. I misread the diff.
>> 5) PlannerGlobal
>>
>> /* List of PartitionPruneInfo contained in the plan */
>> List *partPruneInfos;
>>
>> Why does this say "contained in the plan" unlike the other fields? Is
>> there some sort of difference? I'm not saying it's wrong.
>
> Ok, maybe the following is a bit more helpful and like the comment for
> other fields:
>
> /* "flat" list of PartitionPruneInfos */
> List *partPruneInfos;
>
WFM
>> 0002
>> ----
>>
>> 1) Isn't it weird/undesirable partkey_datum_from_expr() loses some of
>> the asserts? Would the assert be incorrect in the new implementation, or
>> are we removing it simply because we happen to not have one of the fields?
>
> The former -- the asserts would be incorrect in the new implementation
> -- because in the new implementation a standalone ExprContext is used
> that is independent of the parent PlanState (when available) for both
> types of runtime pruning.
>
> The old asserts, particularly the second one, weren't asserting
> something very useful anyway, IMO. What I mean is that the
> ExprContext provided in the PartitionPruneContext to be the same as
> the parent PlanState's ps_ExprContext isn't critical to the code that
> follows. Nor whether the PlanState is available or not.
>
OK, thanks for explaining
>> 2) inconsistent spelling: run-time vs. runtime
>
> I assume you meant in this comment:
>
> * estate The EState for the query doing runtime pruning
>
> Fixed by using run-time, which is a more commonly used term in the
> source code than runtime.
>
Not quite. I was looking at runtime/run-time in the patch files, but now
I realize some of that is preexisting ... Still, maybe the patch should
stick to one spelling.
>> 2) I'm not quite sure what "exec" partition pruning is?
>>
>> /*
>> * ExecInitPartitionPruning
>> * Initialize the data structures needed for runtime "exec" partition
>> * pruning and return the result of initial pruning, if available.
>>
>> Is that the same thing as "runtime pruning"?
>
> "Exec" pruning refers to pruning performed during execution, using
> PARAM_EXEC parameters. In contrast, "init" pruning occurs during plan
> initialization, using parameters whose values remain constant during
> execution, such as PARAM_EXTERN parameters and stable functions.
>
> Before this patch, the ExecInitPartitionPruning function, called
> during ExecutorStart(), performed "init" pruning and set up state in
> the PartitionPruneState for subsequent "exec" pruning during
> ExecutorRun(). With this patch, "init" pruning is performed well
> before this function is called, leaving its sole responsibility to
> setting up the state for "exec" pruning. It may be worth renaming the
> function to better reflect this new role, rather than updating only
> the comment.
>
> Actually, that is what I decided to do in the attached, along with
> some other adjustments like moving ExecDoInitialPruning() to
> execPartition.c from execMain.c, fixing up some obsolete comments,
> etc.
>
I don't see any attachment :-(
Anyway, if I understand correctly, the "runtime pruning" has two
separate cases - initial pruning and exec pruning. Is that right?
>
>>
>> 2) It may not be quite clear why ExecInitUpdateProjection() switches to
>> mt_updateColnosLists. Should that be explained in a comment, somewhere?
>
> There is a comment in the ModifyTableState struct definition:
>
> /*
> * List of valid updateColnosLists. Contains only those belonging to
> * unpruned relations from ModifyTable.updateColnosLists.
> */
> List *mt_updateColnosLists;
>
> It seems redundant to reiterate this in ExecInitUpdateProjection().
>
Ah, I see. Makes sense.
>
>> 0005
>> ----
>>
>> 1) auto_explain.c - So what happens if the plan gets invalidated? The
>> hook explain_ExecutorStart returns early, but then what? Does that break
>> the user session somehow, or what?
>
> It will get called again after ExecutorStartExt() loops back to do
> ExecutorStart() with a new updated plan tree.
>
>> 2) Isn't it a bit fragile if this requires every extension to update
>> and add the ExecPlanStillValid() calls to various places?
>
> The ExecPlanStillValid() call only needs to be added immediately after
> the call to standard_ExecutorStart() in an extension's
> ExecutorStart_hook() implementation.
>
>> What if an
>> extension doesn't do that? What weirdness will happen?
>
> The QueryDesc.planstate won't contain a PlanState tree for starters
> and other state information that InitPlan() populates in EState based
> on the PlannedStmt.
>
OK, and the consequence is that the query will fail, right?
>> Maybe it'd be
>> possible to at least check this in some other executor hook? Or at least
>> we could ensure the check was done in assert-enabled builds? Or
>> something to make extension authors aware of this?
>
> I've added a note in the commit message, but if that's not enough, one
> idea might be to change the return type of ExecutorStart_hook so that
> the extensions that implement it are forced to be adjusted. Say, from
> void to bool to indicate whether standard_ExecutorStart() succeeded
> and thus created a "valid" plan. I had that in the previous versions
> of the patch. Thoughts?
>
Maybe. My concern is that this case (plan getting invalidated) is fairly
rare, so it's entirely plausible the extension will seem to work just
fine without the code update for a long time.
Sure, changing the APIs is allowed, I'm just wondering if maybe there
might be a way to not have this issue, or at least notice the missing
call early.
I haven't tried, wouldn't it be better to modify ExecutorStart() to do
the retries internally? I mean, the extensions wouldn't need to check if
the plan is still valid, ExecutorStart() would take care of that. Yeah,
it might need some new arguments, but that's more obvious.
>> Aside from going through the patches, I did a simple benchmark to see
>> how this works in practice. I did a simple test, with pgbench -S and
>> variable number of partitions/clients. I also varied the number of locks
>> per transaction, because I was wondering if it may interact with the
>> fast-path improvements. See the attached xeon.sh script and CSV with
>> results from the 44/88-core machine.
>>
>> There's also two PDFs visualizing the results, to show the impact as a
>> difference between "master" (no patches) vs. "pruning" build with v57
>> applied. As usual, "green" is good (faster), read is "bad" (slower).
>>
>> For most combinations of parameters, there's no impact on throughput.
>> Anything in 99-101% is just regular noise, possibly even more. I'm
>> trying to reduce the noise a bit more, but this seems acceptable. I'd
>> like to discuss three "cases" I see in the results:
>
> Thanks for doing these benchmarks. I'll reply separately to discuss
> the individual cases.
>
>> costing / auto mode
>> -------------------
>>
>> Anyway, this leads me to a related question - not quite a "bug" in the
>> patch, but something to perhaps think about. And that's costing, and
>> what "auto" should do.
>>
>> There are two PNG charts, showing throughput for runs with -M prepared
>> and 1000 partitions. Each chart shows throughput for the three cache
>> modes, and different client counts. There's a clear distinction between
>> "master" and "patched" runs - the "generic" plans performed terribly, by
>> orders of magnitude. With the patches it beats the "custom" plans.
>>
>> Which is great! But it also means that while "auto" used to do the right
>> thing, with the patches that's not the case.
>>
>> AFAIK that's because we don't consider the runtime pruning when costing
>> the plans, so the cost is calculated as if no pruning happened. And so
>> it seems way more expensive than it should ... and it loses with the
>> custom scans. Is that correct, or do I understand this wrong?
>
> That's correct. The planner does not consider runtime pruning when
> assigning costs to Append or MergeAppend paths in
> create_{merge}append_path().
>
>> Just to be clear, I'm not claiming the patch has to deal with this. I
>> suppose it can be handled as a future improvement, and I'm not even sure
>> there's a good way to consider this during costing. For example, can we
>> estimate how many partitions will be pruned?
>
> There have been discussions about this in the 2017 development thread
> of run-time pruning [1] and likely at some later point in other
> threads. One simple approach mentioned at [1] is to consider that
> only 1 partition will be scanned for queries containing WHERE partkey
> = $1, because only 1 partition can contain matching rows with that
> condition.
>
> I agree that this should be dealt with sooner than later so users get
> generic plans even without having to use force_generic_plan.
>
> I'll post the updated patches tomorrow.
>
Cool, thanks!
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
@ 2024-12-05 06:53 ` Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 13:53 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
0 siblings, 2 replies; 66+ messages in thread
From: Amit Langote @ 2024-12-05 06:53 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
> On 12/4/24 14:34, Amit Langote wrote:
> > On Mon, Dec 2, 2024 at 3:36 AM Tomas Vondra <[email protected]> wrote:
> >> 0001
> >> ----
> >>
> >> 1) But if we don't expect this error to actually happen, do we really
> >> need to make it ereport()? Maybe it should be plain elog(). I mean, it's
> >> "can't happen" and thus doesn't need translations etc.
> >>
> >> if (!bms_equal(relids, pruneinfo->relids))
> >> ereport(ERROR,
> >> errcode(ERRCODE_INTERNAL_ERROR),
> >> errmsg_internal("mismatching PartitionPruneInfo found at
> >> part_prune_index %d",
> >> part_prune_index),
> >> errdetail_internal("plan node relids %s, pruneinfo
> >> relids %s",
> >> bmsToString(relids),
> >> bmsToString(pruneinfo->relids)));
> >
> > I'm fine with elog() here even if it causes the message to be longer:
> >
> > elog(ERROR, "mismatching PartitionPruneInfo found at part_prune_index
> > %d (plan node relids %s, pruneinfo relids %s)
> >
>
> I'm not forcing you to do elog, if you think ereport() is better. I'm
> only asking because AFAIK the "policy" is that ereport is for cases that
> think can happen (and thus get translated), while elog(ERROR) is for
> cases that we believe shouldn't happen.
>
> So every time I see "ereport" I ask myself "how could this happen" which
> doesn't seem to be the case here.
>
> >> Perhaps it should even be an assert?
> >
> > I am not sure about that. Having a message handy might be good if a
> > user ends up hitting this case for whatever reason, like trying to run
> > a corrupted plan.
>
> I'm a bit skeptical about this, TBH. If we assume the plan is
> "corrupted", why should we notice in this particular place? I mean, it
> could be corrupted in a million different ways, and the chance that it
> got through all the earlier steps is like 1 in a 1.000.000.
Yeah, I am starting to think the same. Btw, the idea to have a check
and elog() / ereport() came from Alvaro upthread:
https://www.postgresql.org/message-id/20221130181201.mfinyvtob3j5i2a6%40alvherre.pgsql
> >> 2) I'm not quite sure what "exec" partition pruning is?
> >>
> >> /*
> >> * ExecInitPartitionPruning
> >> * Initialize the data structures needed for runtime "exec" partition
> >> * pruning and return the result of initial pruning, if available.
> >>
> >> Is that the same thing as "runtime pruning"?
> >
> > "Exec" pruning refers to pruning performed during execution, using
> > PARAM_EXEC parameters. In contrast, "init" pruning occurs during plan
> > initialization, using parameters whose values remain constant during
> > execution, such as PARAM_EXTERN parameters and stable functions.
> >
> > Before this patch, the ExecInitPartitionPruning function, called
> > during ExecutorStart(), performed "init" pruning and set up state in
> > the PartitionPruneState for subsequent "exec" pruning during
> > ExecutorRun(). With this patch, "init" pruning is performed well
> > before this function is called, leaving its sole responsibility to
> > setting up the state for "exec" pruning. It may be worth renaming the
> > function to better reflect this new role, rather than updating only
> > the comment.
> >
> > Actually, that is what I decided to do in the attached, along with
> > some other adjustments like moving ExecDoInitialPruning() to
> > execPartition.c from execMain.c, fixing up some obsolete comments,
> > etc.
> >
>
> I don't see any attachment :-(
>
> Anyway, if I understand correctly, the "runtime pruning" has two
> separate cases - initial pruning and exec pruning. Is that right?
That's correct. These patches are about performing "initial" pruning
at a different time and place so that we can take the deferred locks
on the unpruned partitions before we perform ExecInitNode() on any of
the plan trees in the PlannedStmt.
> >> 0005
> >> ----
> >>
> >> 1) auto_explain.c - So what happens if the plan gets invalidated? The
> >> hook explain_ExecutorStart returns early, but then what? Does that break
> >> the user session somehow, or what?
> >
> > It will get called again after ExecutorStartExt() loops back to do
> > ExecutorStart() with a new updated plan tree.
> >
> >> 2) Isn't it a bit fragile if this requires every extension to update
> >> and add the ExecPlanStillValid() calls to various places?
> >
> > The ExecPlanStillValid() call only needs to be added immediately after
> > the call to standard_ExecutorStart() in an extension's
> > ExecutorStart_hook() implementation.
> >
> >> What if an
> >> extension doesn't do that? What weirdness will happen?
> >
> > The QueryDesc.planstate won't contain a PlanState tree for starters
> > and other state information that InitPlan() populates in EState based
> > on the PlannedStmt.
>
> OK, and the consequence is that the query will fail, right?
No, the core executor will retry the execution with a new updated
plan. In the absence of the early return, the extension might even
crash when accessing such incomplete QueryDesc.
What the patch makes the ExecutorStart_hook do is similar to how
InitPlan() will return early when locks taken on partitions that
survive initial pruning invalidate the plan.
> >> Maybe it'd be
> >> possible to at least check this in some other executor hook? Or at least
> >> we could ensure the check was done in assert-enabled builds? Or
> >> something to make extension authors aware of this?
> >
> > I've added a note in the commit message, but if that's not enough, one
> > idea might be to change the return type of ExecutorStart_hook so that
> > the extensions that implement it are forced to be adjusted. Say, from
> > void to bool to indicate whether standard_ExecutorStart() succeeded
> > and thus created a "valid" plan. I had that in the previous versions
> > of the patch. Thoughts?
>
> Maybe. My concern is that this case (plan getting invalidated) is fairly
> rare, so it's entirely plausible the extension will seem to work just
> fine without the code update for a long time.
You might see the errors like the one below when the core executor or
a hook tries to initialize or process in some other way a known
invalid plan, for example, because an unpruned partition's index got
concurrently dropped before the executor got the lock:
ERROR: could not open relation with OID xxx
> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> might be a way to not have this issue, or at least notice the missing
> call early.
>
> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> the retries internally? I mean, the extensions wouldn't need to check if
> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> it might need some new arguments, but that's more obvious.
One approach could be to move some code from standard_ExecutorStart()
into ExecutorStart(). Specifically, the code responsible for setting
up enough state in the EState to perform ExecDoInitialPruning(), which
takes locks that might invalidate the plan. If the plan does become
invalid, the hook and standard_ExecutorStart() are not called.
Instead, the caller, ExecutorStartExt() in this case, creates a new
plan.
This avoids the need to add ExecPlanStillValid() checks anywhere,
whether in core or extension code. However, it does mean accessing the
PlannedStmt earlier than InitPlan(), but the current placement of the
code is not exactly set in stone.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2024-12-05 11:28 ` Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Amit Langote @ 2024-12-05 11:28 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <[email protected]> wrote:
> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
> > Sure, changing the APIs is allowed, I'm just wondering if maybe there
> > might be a way to not have this issue, or at least notice the missing
> > call early.
> >
> > I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> > the retries internally? I mean, the extensions wouldn't need to check if
> > the plan is still valid, ExecutorStart() would take care of that. Yeah,
> > it might need some new arguments, but that's more obvious.
>
> One approach could be to move some code from standard_ExecutorStart()
> into ExecutorStart(). Specifically, the code responsible for setting
> up enough state in the EState to perform ExecDoInitialPruning(), which
> takes locks that might invalidate the plan. If the plan does become
> invalid, the hook and standard_ExecutorStart() are not called.
> Instead, the caller, ExecutorStartExt() in this case, creates a new
> plan.
>
> This avoids the need to add ExecPlanStillValid() checks anywhere,
> whether in core or extension code. However, it does mean accessing the
> PlannedStmt earlier than InitPlan(), but the current placement of the
> code is not exactly set in stone.
I tried this approach and found that it essentially disables testing
of this patch using the delay_execution module, which relies on the
ExecutorStart_hook(). The way the testing works is that the hook in
delay_execution.c pauses the execution of a cached plan to allow a
concurrent session to drop an index referenced in the plan. When
unpaused, execution initialization resumes by calling
standard_ExecutorStart(). At this point, obtaining the lock on the
partition whose index has been dropped invalidates the plan, which the
hook detects and reports. It then also reports the successful
re-execution of an updated plan that no longer references the dropped
index. Hmm.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2024-12-05 14:07 ` Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Tomas Vondra @ 2024-12-05 14:07 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On 12/5/24 12:28, Amit Langote wrote:
> On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <[email protected]> wrote:
>> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
>>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
>>> might be a way to not have this issue, or at least notice the missing
>>> call early.
>>>
>>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
>>> the retries internally? I mean, the extensions wouldn't need to check if
>>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
>>> it might need some new arguments, but that's more obvious.
>>
>> One approach could be to move some code from standard_ExecutorStart()
>> into ExecutorStart(). Specifically, the code responsible for setting
>> up enough state in the EState to perform ExecDoInitialPruning(), which
>> takes locks that might invalidate the plan. If the plan does become
>> invalid, the hook and standard_ExecutorStart() are not called.
>> Instead, the caller, ExecutorStartExt() in this case, creates a new
>> plan.
>>
>> This avoids the need to add ExecPlanStillValid() checks anywhere,
>> whether in core or extension code. However, it does mean accessing the
>> PlannedStmt earlier than InitPlan(), but the current placement of the
>> code is not exactly set in stone.
>
> I tried this approach and found that it essentially disables testing
> of this patch using the delay_execution module, which relies on the
> ExecutorStart_hook(). The way the testing works is that the hook in
> delay_execution.c pauses the execution of a cached plan to allow a
> concurrent session to drop an index referenced in the plan. When
> unpaused, execution initialization resumes by calling
> standard_ExecutorStart(). At this point, obtaining the lock on the
> partition whose index has been dropped invalidates the plan, which the
> hook detects and reports. It then also reports the successful
> re-execution of an updated plan that no longer references the dropped
> index. Hmm.
>
It's not clear to me why the change disables this testing, and I can't
try without a patch. Could you explain?
thanks
--
Tomas Vondra
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
@ 2024-12-06 08:18 ` Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2024-12-06 08:18 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Thu, Dec 5, 2024 at 11:07 PM Tomas Vondra <[email protected]> wrote:
> On 12/5/24 12:28, Amit Langote wrote:
> > On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <[email protected]> wrote:
> >> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
> >>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> >>> might be a way to not have this issue, or at least notice the missing
> >>> call early.
> >>>
> >>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> >>> the retries internally? I mean, the extensions wouldn't need to check if
> >>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> >>> it might need some new arguments, but that's more obvious.
> >>
> >> One approach could be to move some code from standard_ExecutorStart()
> >> into ExecutorStart(). Specifically, the code responsible for setting
> >> up enough state in the EState to perform ExecDoInitialPruning(), which
> >> takes locks that might invalidate the plan. If the plan does become
> >> invalid, the hook and standard_ExecutorStart() are not called.
> >> Instead, the caller, ExecutorStartExt() in this case, creates a new
> >> plan.
> >>
> >> This avoids the need to add ExecPlanStillValid() checks anywhere,
> >> whether in core or extension code. However, it does mean accessing the
> >> PlannedStmt earlier than InitPlan(), but the current placement of the
> >> code is not exactly set in stone.
> >
> > I tried this approach and found that it essentially disables testing
> > of this patch using the delay_execution module, which relies on the
> > ExecutorStart_hook(). The way the testing works is that the hook in
> > delay_execution.c pauses the execution of a cached plan to allow a
> > concurrent session to drop an index referenced in the plan. When
> > unpaused, execution initialization resumes by calling
> > standard_ExecutorStart(). At this point, obtaining the lock on the
> > partition whose index has been dropped invalidates the plan, which the
> > hook detects and reports. It then also reports the successful
> > re-execution of an updated plan that no longer references the dropped
> > index. Hmm.
> >
>
> It's not clear to me why the change disables this testing, and I can't
> try without a patch. Could you explain?
Sorry, PFA the delta patch for the change I described above. It
applies on top of v58 series of patches that I posted yesterday.
You'll notice that delay_execution test fails if you apply and do
check-world.
As for how the change breaks the testing, here is a before and after
of the flow of a isolation test in
src/test/modules/delay_execution/specs/cached-plan-inval.spec (s1 is
the session used to run a cached plan, s2 to perform concurrent DDL
that invalidates the plan):
* Before (working):
1. s2 takes advisory lock
2. s1 runs cached plan -> goes to ExecutorStart_hook -> waits for the
advisory lock
3. s2 drops an index referenced in the plan
4. s2 unlocks advisory lock
5. s1 locks unpruned partitions -> detects plan invalidation due to
dropped index.
* After (stops working because initial pruning and locking are done
before calling ExecutorStart_hook):
1. s2 takes advisory lock
2. s1 runs cached plan -> locks unpruned partitions -> goes to
ExecutorStart_hook to get advisory lock -> waits for advisory lock
3. s2 drops an index referenced in the plan -> waits for lock on the
unpruned partition -> deadlock!
One idea I had after sending the email yesterday is to introduce
ExecutorStartCachedPlan_hook for the advisory lock based waiting.
ExecutorStartCachedPlan() is the new function that you will find in
v58-0004 that wraps ExecutorStart() to handle plan invalidation. This
new hook would be called before ExecutorStartCachedPlan() calls
ExecutorStart(), so the original testing flow can still work.
Another idea might be to use injection points infra to introduce the
wait instead of the combination of a executor hook and advisory lock.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] pruning-in-ExecutorStart.diff (13.6K, 2-pruning-in-ExecutorStart.diff)
download | inline diff:
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 8b5eaf3ef3..623a674f99 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -298,10 +298,6 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index b11691ae26..49c657b3e0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -994,10 +994,6 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9543d9490c..18758287bf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -123,6 +123,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ EState *estate;
+ MemoryContext oldcontext;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *rangeTable = plannedstmt->rtable;
+ CachedPlan *cachedplan = queryDesc->cplan;
+
+ /* sanity checks: queryDesc must not be started already */
+ Assert(queryDesc != NULL);
+ Assert(queryDesc->estate == NULL);
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -133,6 +143,117 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+ /*
+ * Build EState, switch into per-query memory context for startup.
+ */
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+ /*
+ * Fill in external parameters, if any, from queryDesc; and allocate
+ * workspace for internal parameters
+ */
+ estate->es_param_list_info = queryDesc->params;
+
+ if (queryDesc->plannedstmt->paramExecTypes != NIL)
+ {
+ int nParamExec;
+
+ nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
+ estate->es_param_exec_vals = (ParamExecData *)
+ palloc0(nParamExec * sizeof(ParamExecData));
+ }
+
+ /* We now require all callers to provide sourceText */
+ Assert(queryDesc->sourceText != NULL);
+ estate->es_sourceText = queryDesc->sourceText;
+
+ /*
+ * Fill in the query environment, if any, from queryDesc.
+ */
+ estate->es_queryEnv = queryDesc->queryEnv;
+
+ /*
+ * If non-read-only query, set the command ID to mark output tuples with
+ */
+ switch (queryDesc->operation)
+ {
+ case CMD_SELECT:
+
+ /*
+ * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
+ * tuples
+ */
+ if (queryDesc->plannedstmt->rowMarks != NIL ||
+ queryDesc->plannedstmt->hasModifyingCTE)
+ estate->es_output_cid = GetCurrentCommandId(true);
+
+ /*
+ * A SELECT without modifying CTEs can't possibly queue triggers,
+ * so force skip-triggers mode. This is just a marginal efficiency
+ * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
+ * all that expensive, but we might as well do it.
+ */
+ if (!queryDesc->plannedstmt->hasModifyingCTE)
+ eflags |= EXEC_FLAG_SKIP_TRIGGERS;
+ break;
+
+ case CMD_INSERT:
+ case CMD_DELETE:
+ case CMD_UPDATE:
+ case CMD_MERGE:
+ estate->es_output_cid = GetCurrentCommandId(true);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized operation code: %d",
+ (int) queryDesc->operation);
+ break;
+ }
+
+ /*
+ * Copy other important information into the EState
+ */
+ estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
+ estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
+ estate->es_top_eflags = eflags;
+ estate->es_instrument = queryDesc->instrument_options;
+ estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
+
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_unpruned_relids = bms_copy(plannedstmt->unprunableRelids);
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+
+ /*
+ * initialize the node's execution state
+ */
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+
+ /*
+ * Perform runtime "initial" pruning to identify which child subplans,
+ * corresponding to the children of plan nodes that contain
+ * PartitionPruneInfo such as Append, will not be executed. The results,
+ * which are bitmapsets of indexes of the child subplans that will be
+ * executed, are saved in es_part_prune_results. These results correspond
+ * to each PartitionPruneInfo entry, and the es_part_prune_results list is
+ * parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate, cachedplan);
+
+ MemoryContextSwitchTo(oldcontext);
+
+ if (cachedplan && !CachedPlanValid(cachedplan))
+ return;
+
if (ExecutorStart_hook)
(*ExecutorStart_hook) (queryDesc, eflags);
else
@@ -198,12 +319,12 @@ ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- EState *estate;
MemoryContext oldcontext;
+ EState *estate;
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
+ Assert(queryDesc->estate != NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -227,85 +348,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
ExecCheckXactReadOnly(queryDesc->plannedstmt);
- /*
- * Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
-
- oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
-
- /*
- * Fill in external parameters, if any, from queryDesc; and allocate
- * workspace for internal parameters
- */
- estate->es_param_list_info = queryDesc->params;
-
- if (queryDesc->plannedstmt->paramExecTypes != NIL)
- {
- int nParamExec;
-
- nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
- estate->es_param_exec_vals = (ParamExecData *)
- palloc0(nParamExec * sizeof(ParamExecData));
- }
-
- /* We now require all callers to provide sourceText */
- Assert(queryDesc->sourceText != NULL);
- estate->es_sourceText = queryDesc->sourceText;
-
- /*
- * Fill in the query environment, if any, from queryDesc.
- */
- estate->es_queryEnv = queryDesc->queryEnv;
-
- /*
- * If non-read-only query, set the command ID to mark output tuples with
- */
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
-
- /*
- * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
- * tuples
- */
- if (queryDesc->plannedstmt->rowMarks != NIL ||
- queryDesc->plannedstmt->hasModifyingCTE)
- estate->es_output_cid = GetCurrentCommandId(true);
-
- /*
- * A SELECT without modifying CTEs can't possibly queue triggers,
- * so force skip-triggers mode. This is just a marginal efficiency
- * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
- * all that expensive, but we might as well do it.
- */
- if (!queryDesc->plannedstmt->hasModifyingCTE)
- eflags |= EXEC_FLAG_SKIP_TRIGGERS;
- break;
-
- case CMD_INSERT:
- case CMD_DELETE:
- case CMD_UPDATE:
- case CMD_MERGE:
- estate->es_output_cid = GetCurrentCommandId(true);
- break;
-
- default:
- elog(ERROR, "unrecognized operation code: %d",
- (int) queryDesc->operation);
- break;
- }
-
- /*
- * Copy other important information into the EState
- */
- estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
- estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
- estate->es_top_eflags = eflags;
- estate->es_instrument = queryDesc->instrument_options;
- estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
-
/*
* Set up an AFTER-trigger statement context, unless told not to, or
* unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
@@ -313,6 +355,9 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
AfterTriggerBeginQuery();
+ estate = queryDesc->estate;
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
/*
* Initialize the plan state tree
*/
@@ -922,46 +967,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
- CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
-
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = cachedplan;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
- estate->es_unpruned_relids = bms_copy(plannedstmt->unprunableRelids);
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- *
- * This will also add the RT indexes of surviving leaf partitions to
- * es_unpruned_relids.
- */
- ExecDoInitialPruning(estate);
-
- if (!ExecPlanStillValid(estate))
- return;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b349ccb211..126f330ef8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1812,7 +1812,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* use the same index to retrieve the pruning results.
*/
void
-ExecDoInitialPruning(EState *estate)
+ExecDoInitialPruning(EState *estate, CachedPlan *cplan)
{
ListCell *lc;
List *locked_relids = NIL;
@@ -1838,7 +1838,7 @@ ExecDoInitialPruning(EState *estate)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- if (ExecShouldLockRelations(estate))
+ if (cplan && CachedPlanRequiresLocking(cplan))
{
int rtindex = -1;
@@ -1866,7 +1866,7 @@ ExecDoInitialPruning(EState *estate)
* Release the useless locks if the plan won't be executed. This is the
* same as what CheckCachedPlan() in plancache.c does.
*/
- if (!ExecPlanStillValid(estate))
+ if (cplan && !CachedPlanValid(cplan))
{
foreach(lc, locked_relids)
{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index a0843481f7..95d886884c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -17,6 +17,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "partitioning/partprune.h"
+#include "utils/plancache.h"
/* See execPartition.c for the definitions. */
typedef struct PartitionDispatchData *PartitionDispatch;
@@ -136,7 +137,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
-void ExecDoInitialPruning(EState *estate);
+void ExecDoInitialPruning(EState *estate, CachedPlan *cplan);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6d72f7d9d6..6a7ca37753 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,31 +265,6 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
-/*
- * Is the CachedPlan in es_cachedplan still valid?
- *
- * Called from InitPlan() because invalidation messages that affect the plan
- * might be received after locks have been taken on runtime-prunable relations.
- * The caller should take appropriate action if the plan has become invalid.
- */
-static inline bool
-ExecPlanStillValid(EState *estate)
-{
- return estate->es_cachedplan == NULL ? true :
- CachedPlanValid(estate->es_cachedplan);
-}
-
-/*
- * Locks are needed only if running a cached plan that might contain unlocked
- * relations, such as a reused generic plan.
- */
-static inline bool
-ExecShouldLockRelations(EState *estate)
-{
- return estate->es_cachedplan == NULL ? false :
- CachedPlanRequiresLocking(estate->es_cachedplan);
-}
-
/* ----------------------------------------------------------------
* ExecProcNode
*
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2024-12-09 07:10 ` Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2024-12-09 07:10 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Fri, Dec 6, 2024 at 5:18 PM Amit Langote <[email protected]> wrote:
> On Thu, Dec 5, 2024 at 11:07 PM Tomas Vondra <[email protected]> wrote:
> > On 12/5/24 12:28, Amit Langote wrote:
> > > On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <[email protected]> wrote:
> > >> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
> > >>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> > >>> might be a way to not have this issue, or at least notice the missing
> > >>> call early.
> > >>>
> > >>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> > >>> the retries internally? I mean, the extensions wouldn't need to check if
> > >>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> > >>> it might need some new arguments, but that's more obvious.
> > >>
> > >> One approach could be to move some code from standard_ExecutorStart()
> > >> into ExecutorStart(). Specifically, the code responsible for setting
> > >> up enough state in the EState to perform ExecDoInitialPruning(), which
> > >> takes locks that might invalidate the plan. If the plan does become
> > >> invalid, the hook and standard_ExecutorStart() are not called.
> > >> Instead, the caller, ExecutorStartExt() in this case, creates a new
> > >> plan.
> > >>
> > >> This avoids the need to add ExecPlanStillValid() checks anywhere,
> > >> whether in core or extension code. However, it does mean accessing the
> > >> PlannedStmt earlier than InitPlan(), but the current placement of the
> > >> code is not exactly set in stone.
> > >
> > > I tried this approach and found that it essentially disables testing
> > > of this patch using the delay_execution module, which relies on the
> > > ExecutorStart_hook(). The way the testing works is that the hook in
> > > delay_execution.c pauses the execution of a cached plan to allow a
> > > concurrent session to drop an index referenced in the plan. When
> > > unpaused, execution initialization resumes by calling
> > > standard_ExecutorStart(). At this point, obtaining the lock on the
> > > partition whose index has been dropped invalidates the plan, which the
> > > hook detects and reports. It then also reports the successful
> > > re-execution of an updated plan that no longer references the dropped
> > > index. Hmm.
> > >
> >
> > It's not clear to me why the change disables this testing, and I can't
> > try without a patch. Could you explain?
>
> Sorry, PFA the delta patch for the change I described above. It
> applies on top of v58 series of patches that I posted yesterday.
> You'll notice that delay_execution test fails if you apply and do
> check-world.
>
> As for how the change breaks the testing, here is a before and after
> of the flow of a isolation test in
> src/test/modules/delay_execution/specs/cached-plan-inval.spec (s1 is
> the session used to run a cached plan, s2 to perform concurrent DDL
> that invalidates the plan):
>
> * Before (working):
>
> 1. s2 takes advisory lock
> 2. s1 runs cached plan -> goes to ExecutorStart_hook -> waits for the
> advisory lock
> 3. s2 drops an index referenced in the plan
> 4. s2 unlocks advisory lock
> 5. s1 locks unpruned partitions -> detects plan invalidation due to
> dropped index.
>
> * After (stops working because initial pruning and locking are done
> before calling ExecutorStart_hook):
>
> 1. s2 takes advisory lock
> 2. s1 runs cached plan -> locks unpruned partitions -> goes to
> ExecutorStart_hook to get advisory lock -> waits for advisory lock
> 3. s2 drops an index referenced in the plan -> waits for lock on the
> unpruned partition -> deadlock!
>
> One idea I had after sending the email yesterday is to introduce
> ExecutorStartCachedPlan_hook for the advisory lock based waiting.
> ExecutorStartCachedPlan() is the new function that you will find in
> v58-0004 that wraps ExecutorStart() to handle plan invalidation. This
> new hook would be called before ExecutorStartCachedPlan() calls
> ExecutorStart(), so the original testing flow can still work.
Here's that patch with this idea implemented that fixes the
delay_execution test breakage. Applies on top of v58 series of
patches.
However, as mentioned in my previous reply, since extensions might
need to adjust their ExecutorStart hook code to check if the RT index
is in EState.es_unpruned_relids when accessing child relations
directly via ExecGetRangeTableRelation(), I can accept them also
adding a check for ExecPlanStillValid() in their ExecutorStart hook.
So we may not want to add a new hook even if only for testing.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] 0005-Remove-the-need-to-check-if-plan-is-valid-from-Execu.patch (23.0K, 2-0005-Remove-the-need-to-check-if-plan-is-valid-from-Execu.patch)
download | inline diff:
From a15f14aab6dd70dd13f4fbae6c996a6875a93c6a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 9 Dec 2024 12:34:04 +0900
Subject: [PATCH 5/5] Remove the need to check if plan is valid from
ExecutorStart hooks
For testing using delay_execution, a new hook
ExecutotStartCachedPlan_hook is added. This hook allows the
delay_execution module to block the execution of the cached plan to
allow a concurrent session to modify the objects referenced in the
plan, which is then detected when the locks are taken on prunable
relations in ExecutorStart().
---
contrib/auto_explain/auto_explain.c | 4 -
.../pg_stat_statements/pg_stat_statements.c | 4 -
src/backend/executor/execMain.c | 254 ++++++++++--------
src/backend/executor/execPartition.c | 6 +-
src/include/executor/execPartition.h | 3 +-
src/include/executor/executor.h | 34 +--
.../modules/delay_execution/delay_execution.c | 20 +-
.../expected/cached-plan-inval.out | 26 +-
8 files changed, 178 insertions(+), 173 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 8b5eaf3ef3..623a674f99 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -298,10 +298,6 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index b11691ae26..49c657b3e0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -994,10 +994,6 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9543d9490c..c2a9a0e86e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -68,6 +68,7 @@
/* Hooks for plugins to get control in ExecutorStart/Run/Finish/End */
ExecutorStart_hook_type ExecutorStart_hook = NULL;
+ExecutorStartCachedPlan_hook_type ExecutorStartCachedPlan_hook = NULL;
ExecutorRun_hook_type ExecutorRun_hook = NULL;
ExecutorFinish_hook_type ExecutorFinish_hook = NULL;
ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
@@ -123,6 +124,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ EState *estate;
+ MemoryContext oldcontext;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *rangeTable = plannedstmt->rtable;
+ CachedPlan *cachedplan = queryDesc->cplan;
+
+ /* sanity checks: queryDesc must not be started already */
+ Assert(queryDesc != NULL);
+ Assert(queryDesc->estate == NULL);
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -133,6 +144,117 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+ /*
+ * Build EState, switch into per-query memory context for startup.
+ */
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+ /*
+ * Fill in external parameters, if any, from queryDesc; and allocate
+ * workspace for internal parameters
+ */
+ estate->es_param_list_info = queryDesc->params;
+
+ if (queryDesc->plannedstmt->paramExecTypes != NIL)
+ {
+ int nParamExec;
+
+ nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
+ estate->es_param_exec_vals = (ParamExecData *)
+ palloc0(nParamExec * sizeof(ParamExecData));
+ }
+
+ /* We now require all callers to provide sourceText */
+ Assert(queryDesc->sourceText != NULL);
+ estate->es_sourceText = queryDesc->sourceText;
+
+ /*
+ * Fill in the query environment, if any, from queryDesc.
+ */
+ estate->es_queryEnv = queryDesc->queryEnv;
+
+ /*
+ * If non-read-only query, set the command ID to mark output tuples with
+ */
+ switch (queryDesc->operation)
+ {
+ case CMD_SELECT:
+
+ /*
+ * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
+ * tuples
+ */
+ if (queryDesc->plannedstmt->rowMarks != NIL ||
+ queryDesc->plannedstmt->hasModifyingCTE)
+ estate->es_output_cid = GetCurrentCommandId(true);
+
+ /*
+ * A SELECT without modifying CTEs can't possibly queue triggers,
+ * so force skip-triggers mode. This is just a marginal efficiency
+ * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
+ * all that expensive, but we might as well do it.
+ */
+ if (!queryDesc->plannedstmt->hasModifyingCTE)
+ eflags |= EXEC_FLAG_SKIP_TRIGGERS;
+ break;
+
+ case CMD_INSERT:
+ case CMD_DELETE:
+ case CMD_UPDATE:
+ case CMD_MERGE:
+ estate->es_output_cid = GetCurrentCommandId(true);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized operation code: %d",
+ (int) queryDesc->operation);
+ break;
+ }
+
+ /*
+ * Copy other important information into the EState
+ */
+ estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
+ estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
+ estate->es_top_eflags = eflags;
+ estate->es_instrument = queryDesc->instrument_options;
+ estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
+
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_unpruned_relids = bms_copy(plannedstmt->unprunableRelids);
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+
+ /*
+ * initialize the node's execution state
+ */
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+
+ /*
+ * Perform runtime "initial" pruning to identify which child subplans,
+ * corresponding to the children of plan nodes that contain
+ * PartitionPruneInfo such as Append, will not be executed. The results,
+ * which are bitmapsets of indexes of the child subplans that will be
+ * executed, are saved in es_part_prune_results. These results correspond
+ * to each PartitionPruneInfo entry, and the es_part_prune_results list is
+ * parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate, cachedplan);
+
+ MemoryContextSwitchTo(oldcontext);
+
+ if (cachedplan && !CachedPlanValid(cachedplan))
+ return;
+
if (ExecutorStart_hook)
(*ExecutorStart_hook) (queryDesc, eflags);
else
@@ -159,6 +281,20 @@ void
ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
CachedPlanSource *plansource,
int query_index)
+{
+ if (ExecutorStartCachedPlan_hook)
+ (*ExecutorStartCachedPlan_hook) (queryDesc, eflags, plansource,
+ query_index);
+ else
+ standard_ExecutorStartCachedPlan(queryDesc, eflags, plansource,
+ query_index);
+
+}
+
+void
+standard_ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
{
if (unlikely(queryDesc->cplan == NULL))
elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
@@ -198,12 +334,12 @@ ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- EState *estate;
MemoryContext oldcontext;
+ EState *estate;
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
+ Assert(queryDesc->estate != NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -227,85 +363,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
ExecCheckXactReadOnly(queryDesc->plannedstmt);
- /*
- * Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
-
- oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
-
- /*
- * Fill in external parameters, if any, from queryDesc; and allocate
- * workspace for internal parameters
- */
- estate->es_param_list_info = queryDesc->params;
-
- if (queryDesc->plannedstmt->paramExecTypes != NIL)
- {
- int nParamExec;
-
- nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
- estate->es_param_exec_vals = (ParamExecData *)
- palloc0(nParamExec * sizeof(ParamExecData));
- }
-
- /* We now require all callers to provide sourceText */
- Assert(queryDesc->sourceText != NULL);
- estate->es_sourceText = queryDesc->sourceText;
-
- /*
- * Fill in the query environment, if any, from queryDesc.
- */
- estate->es_queryEnv = queryDesc->queryEnv;
-
- /*
- * If non-read-only query, set the command ID to mark output tuples with
- */
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
-
- /*
- * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
- * tuples
- */
- if (queryDesc->plannedstmt->rowMarks != NIL ||
- queryDesc->plannedstmt->hasModifyingCTE)
- estate->es_output_cid = GetCurrentCommandId(true);
-
- /*
- * A SELECT without modifying CTEs can't possibly queue triggers,
- * so force skip-triggers mode. This is just a marginal efficiency
- * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
- * all that expensive, but we might as well do it.
- */
- if (!queryDesc->plannedstmt->hasModifyingCTE)
- eflags |= EXEC_FLAG_SKIP_TRIGGERS;
- break;
-
- case CMD_INSERT:
- case CMD_DELETE:
- case CMD_UPDATE:
- case CMD_MERGE:
- estate->es_output_cid = GetCurrentCommandId(true);
- break;
-
- default:
- elog(ERROR, "unrecognized operation code: %d",
- (int) queryDesc->operation);
- break;
- }
-
- /*
- * Copy other important information into the EState
- */
- estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
- estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
- estate->es_top_eflags = eflags;
- estate->es_instrument = queryDesc->instrument_options;
- estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
-
/*
* Set up an AFTER-trigger statement context, unless told not to, or
* unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
@@ -313,6 +370,9 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
AfterTriggerBeginQuery();
+ estate = queryDesc->estate;
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
/*
* Initialize the plan state tree
*/
@@ -922,46 +982,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
- CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
-
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = cachedplan;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
- estate->es_unpruned_relids = bms_copy(plannedstmt->unprunableRelids);
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- *
- * This will also add the RT indexes of surviving leaf partitions to
- * es_unpruned_relids.
- */
- ExecDoInitialPruning(estate);
-
- if (!ExecPlanStillValid(estate))
- return;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 93cdae6f89..455e0d0f87 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1812,7 +1812,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* use the same index to retrieve the pruning results.
*/
void
-ExecDoInitialPruning(EState *estate)
+ExecDoInitialPruning(EState *estate, CachedPlan *cplan)
{
ListCell *lc;
List *locked_relids = NIL;
@@ -1838,7 +1838,7 @@ ExecDoInitialPruning(EState *estate)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- if (ExecShouldLockRelations(estate))
+ if (cplan && CachedPlanRequiresLocking(cplan))
{
int rtindex = -1;
@@ -1866,7 +1866,7 @@ ExecDoInitialPruning(EState *estate)
* Release the useless locks if the plan won't be executed. This is the
* same as what CheckCachedPlan() in plancache.c does.
*/
- if (!ExecPlanStillValid(estate))
+ if (cplan && !CachedPlanValid(cplan))
{
foreach(lc, locked_relids)
{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index a0843481f7..95d886884c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -17,6 +17,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "partitioning/partprune.h"
+#include "utils/plancache.h"
/* See execPartition.c for the definitions. */
typedef struct PartitionDispatchData *PartitionDispatch;
@@ -136,7 +137,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
-void ExecDoInitialPruning(EState *estate);
+void ExecDoInitialPruning(EState *estate, CachedPlan *cplan);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6d72f7d9d6..1647461f0a 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -76,6 +76,12 @@
typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
+/* Hook for plugins to get control in ExecutorStartCachedPlan() */
+typedef void (*ExecutorStartCachedPlan_hook_type) (QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
+extern PGDLLIMPORT ExecutorStartCachedPlan_hook_type ExecutorStartCachedPlan_hook;
+
/* Hook for plugins to get control in ExecutorRun() */
typedef void (*ExecutorRun_hook_type) (QueryDesc *queryDesc,
ScanDirection direction,
@@ -203,6 +209,9 @@ extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
CachedPlanSource *plansource,
int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void standard_ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -265,31 +274,6 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
-/*
- * Is the CachedPlan in es_cachedplan still valid?
- *
- * Called from InitPlan() because invalidation messages that affect the plan
- * might be received after locks have been taken on runtime-prunable relations.
- * The caller should take appropriate action if the plan has become invalid.
- */
-static inline bool
-ExecPlanStillValid(EState *estate)
-{
- return estate->es_cachedplan == NULL ? true :
- CachedPlanValid(estate->es_cachedplan);
-}
-
-/*
- * Locks are needed only if running a cached plan that might contain unlocked
- * relations, such as a reused generic plan.
- */
-static inline bool
-ExecShouldLockRelations(EState *estate)
-{
- return estate->es_cachedplan == NULL ? false :
- CachedPlanRequiresLocking(estate->es_cachedplan);
-}
-
/* ----------------------------------------------------------------
* ExecProcNode
*
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 44aa828fdf..a3bfaed372 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -41,7 +41,7 @@ static int executor_start_lock_id = 0;
/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
-static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
+static ExecutorStartCachedPlan_hook_type prev_ExecutorStartCachedPlan_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -79,7 +79,9 @@ delay_execution_planner(Query *parse, const char *query_string,
/* ExecutorStart_hook function to provide the desired delay */
static void
-delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+delay_execution_ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
{
/* If enabled, delay by taking and releasing the specified lock */
if (executor_start_lock_id != 0)
@@ -97,13 +99,15 @@ delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
/* Now start the executor, possibly via a previous hook user */
- if (prev_ExecutorStart_hook)
- prev_ExecutorStart_hook(queryDesc, eflags);
+ if (prev_ExecutorStartCachedPlan_hook)
+ prev_ExecutorStartCachedPlan_hook(queryDesc, eflags, plansource,
+ query_index);
else
- standard_ExecutorStart(queryDesc, eflags);
+ standard_ExecutorStartCachedPlan(queryDesc, eflags, plansource,
+ query_index);
if (executor_start_lock_id != 0)
- elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ elog(NOTICE, "Finished ExecutorStartCachedPlan(): CachedPlan is %s",
CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
}
@@ -139,6 +143,6 @@ _PG_init(void)
/* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
- prev_ExecutorStart_hook = ExecutorStart_hook;
- ExecutorStart_hook = delay_execution_ExecutorStart;
+ prev_ExecutorStartCachedPlan_hook = ExecutorStartCachedPlan_hook;
+ ExecutorStartCachedPlan_hook = delay_execution_ExecutorStartCachedPlan;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
index 5bfb2b33b3..165f865b7a 100644
--- a/src/test/modules/delay_execution/expected/cached-plan-inval.out
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -32,8 +32,7 @@ t
(1 row)
step s1exec: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
-------------------------------------
LockRows
@@ -48,7 +47,7 @@ starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
step s1prep2: SET plan_cache_mode = force_generic_plan;
PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
EXPLAIN (COSTS OFF) EXECUTE q2;
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
--------------------------------------------------
Append
@@ -81,8 +80,7 @@ t
(1 row)
step s1exec2: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
--------------------------------------------
Append
@@ -98,9 +96,9 @@ starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
step s1prep3: SET plan_cache_mode = force_generic_plan;
PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
EXPLAIN (COSTS OFF) EXECUTE q3;
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
--------------------------------------------------------------
Nested Loop
@@ -178,10 +176,9 @@ t
(1 row)
step s1exec3: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
-------------------------------------------------------------
Nested Loop
@@ -233,7 +230,7 @@ step s1prep4: SET plan_cache_mode = force_generic_plan;
SET enable_seqscan TO off;
PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
EXPLAIN (COSTS OFF) EXECUTE q4 (1);
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
---------------------------------------------------------------
Result
@@ -264,8 +261,7 @@ t
(1 row)
step s1exec4: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
---------------------------------------------
Result
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2024-12-12 07:58 ` Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2024-12-12 07:58 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Mon, Dec 9, 2024 at 4:10 PM Amit Langote <[email protected]> wrote:
> On Fri, Dec 6, 2024 at 5:18 PM Amit Langote <[email protected]> wrote:
> > On Thu, Dec 5, 2024 at 11:07 PM Tomas Vondra <[email protected]> wrote:
> > > On 12/5/24 12:28, Amit Langote wrote:
> > > > On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <[email protected]> wrote:
> > > >> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
> > > >>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> > > >>> might be a way to not have this issue, or at least notice the missing
> > > >>> call early.
> > > >>>
> > > >>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> > > >>> the retries internally? I mean, the extensions wouldn't need to check if
> > > >>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> > > >>> it might need some new arguments, but that's more obvious.
> > > >>
> > > >> One approach could be to move some code from standard_ExecutorStart()
> > > >> into ExecutorStart(). Specifically, the code responsible for setting
> > > >> up enough state in the EState to perform ExecDoInitialPruning(), which
> > > >> takes locks that might invalidate the plan. If the plan does become
> > > >> invalid, the hook and standard_ExecutorStart() are not called.
> > > >> Instead, the caller, ExecutorStartExt() in this case, creates a new
> > > >> plan.
> > > >>
> > > >> This avoids the need to add ExecPlanStillValid() checks anywhere,
> > > >> whether in core or extension code. However, it does mean accessing the
> > > >> PlannedStmt earlier than InitPlan(), but the current placement of the
> > > >> code is not exactly set in stone.
> > > >
> > > > I tried this approach and found that it essentially disables testing
> > > > of this patch using the delay_execution module, which relies on the
> > > > ExecutorStart_hook(). The way the testing works is that the hook in
> > > > delay_execution.c pauses the execution of a cached plan to allow a
> > > > concurrent session to drop an index referenced in the plan. When
> > > > unpaused, execution initialization resumes by calling
> > > > standard_ExecutorStart(). At this point, obtaining the lock on the
> > > > partition whose index has been dropped invalidates the plan, which the
> > > > hook detects and reports. It then also reports the successful
> > > > re-execution of an updated plan that no longer references the dropped
> > > > index. Hmm.
> > > >
> > >
> > > It's not clear to me why the change disables this testing, and I can't
> > > try without a patch. Could you explain?
> >
> > Sorry, PFA the delta patch for the change I described above. It
> > applies on top of v58 series of patches that I posted yesterday.
> > You'll notice that delay_execution test fails if you apply and do
> > check-world.
> >
> > As for how the change breaks the testing, here is a before and after
> > of the flow of a isolation test in
> > src/test/modules/delay_execution/specs/cached-plan-inval.spec (s1 is
> > the session used to run a cached plan, s2 to perform concurrent DDL
> > that invalidates the plan):
> >
> > * Before (working):
> >
> > 1. s2 takes advisory lock
> > 2. s1 runs cached plan -> goes to ExecutorStart_hook -> waits for the
> > advisory lock
> > 3. s2 drops an index referenced in the plan
> > 4. s2 unlocks advisory lock
> > 5. s1 locks unpruned partitions -> detects plan invalidation due to
> > dropped index.
> >
> > * After (stops working because initial pruning and locking are done
> > before calling ExecutorStart_hook):
> >
> > 1. s2 takes advisory lock
> > 2. s1 runs cached plan -> locks unpruned partitions -> goes to
> > ExecutorStart_hook to get advisory lock -> waits for advisory lock
> > 3. s2 drops an index referenced in the plan -> waits for lock on the
> > unpruned partition -> deadlock!
> >
> > One idea I had after sending the email yesterday is to introduce
> > ExecutorStartCachedPlan_hook for the advisory lock based waiting.
> > ExecutorStartCachedPlan() is the new function that you will find in
> > v58-0004 that wraps ExecutorStart() to handle plan invalidation. This
> > new hook would be called before ExecutorStartCachedPlan() calls
> > ExecutorStart(), so the original testing flow can still work.
>
> Here's that patch with this idea implemented that fixes the
> delay_execution test breakage. Applies on top of v58 series of
> patches.
>
> However, as mentioned in my previous reply, since extensions might
> need to adjust their ExecutorStart hook code to check if the RT index
> is in EState.es_unpruned_relids when accessing child relations
> directly via ExecGetRangeTableRelation(), I can accept them also
> adding a check for ExecPlanStillValid() in their ExecutorStart hook.
> So we may not want to add a new hook even if only for testing.
One thing I realized about the es_unpruned_relids bitmapset is that
ExecGetRangeTableRelation() should verify that any RT index passed to
it is a member of the bitmapset. If the RT index is not included, the
function should throw an error to catch such cases. I made that change
in 0004. This approach can help identify extensions that manipulate RT
entries belonging to potentially pruned partitions, provided they use
the ExecGetRangeTableRelation() interface to open those relations.
To summarize how extensions can be affected:
1. Plan invalidation during standard_ExecutorStart(): A plan tree
originating from a CachedPlan can become invalid during
standard_ExecutorStart() due to locks taken on leaf partitions that
survive initial pruning. Extensions should be updated to handle this
scenario by checking ExecPlanStillValid(estate) immediately after
calling standard_ExecutorStart() in their ExecutorStart_hook. If it
returns false, the extensions should avoid further processing.
2. Validation of RT indexes: If the plan tree remains valid, any
direct manipulation of relations using RT indexes must first verify
that the RT index is present in the EState.es_unpruned_relids
bitmapset. This bitmapset includes: a) RT indexes of relations that
are originally unprunable (and locked during GetCachedPlan()), and
b) RT indexes of leaf partitions that survive initial partition
pruning. This step is crucial because pruned relations are not locked.
Additionally, with the update in 0004, attempting to open pruned
relations using ExecGetRangeTableRelation() will result in an error.
I’d love to hear from anyone maintaining executor hooks, such as those
from Timescale, Citus, or other extension developers. Please give this
patch set (0001-0004) a try and let me know if you run into any issues
or have feedback. 0005 is a sketch of an approach that eliminates the
need for extensions to check ExecPlanStillValid() in their
ExecutorStart_hook.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v59-0002-Initialize-PartitionPruneContexts-lazily.patch (16.9K, 2-v59-0002-Initialize-PartitionPruneContexts-lazily.patch)
download | inline diff:
From 5a4c58d81a75d0ec94f7cafa7feeef2439a39cc6 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 4 Dec 2024 16:16:41 +0900
Subject: [PATCH v59 2/5] Initialize PartitionPruneContexts lazily
This commit moves the initialization of PartitionPruneContexts for
both initial and exec pruning steps from CreatePartitionPruneState()
to find_matching_subplans_recurse(), where they are actually needed.
To track whether the context has been initialized and is ready for
use, a boolean field is_valid has been added to PartitionPruneContext.
The primary motivation is to allow CreatePartitionPruneState() to be
called before ExecInitNode(). Right now, it's coupled with
ExecInitNode() because setting up the exec pruning context requires
access to the parent plan node's PlanState. By deferring context
creation to where it's actually needed, we break this dependency.
The ExprContext used for both pruning phases is now a standalone
context, independent of the parent PlanState.
This change will be useful in a future commit, which will move initial
pruning to occur outside ExecInitNode(), specifically before it is
called by InitPlan().
Reviewed-by: Robert Haas
Reviewed-by: Tom Lane
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execPartition.c | 151 +++++++++++++++++++--------
src/backend/partitioning/partprune.c | 7 +-
src/include/executor/execPartition.h | 12 +++
src/include/partitioning/partprune.h | 2 +
4 files changed, 123 insertions(+), 49 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 950fa3289c..f4d425cd45 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,18 +181,17 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
+static PartitionPruneState *CreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
-static void InitPartitionPruneContext(PartitionPruneContext *context,
+static void InitPartitionPruneContext(PartitionedRelPruningData *pprune,
+ PartitionPruneContext *context,
List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate,
- ExprContext *econtext);
+ PlanState *planstate);
static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
Bitmapset *initially_valid_subplans,
int n_total_subplans);
-static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
+static void find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
@@ -1823,7 +1822,14 @@ ExecInitPartitionPruning(PlanState *planstate,
ExecAssignExprContext(estate, planstate);
/* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+
+ /*
+ * Store PlanState for using it to initialize exec pruning contexts later
+ * in find_matching_subplans_recurse() where they are needed.
+ */
+ if (prunestate->do_exec_prune)
+ prunestate->parent_plan = planstate;
/*
* Perform an initial partition prune pass, if required.
@@ -1863,8 +1869,6 @@ ExecInitPartitionPruning(PlanState *planstate,
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
- *
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
* PartitionPruningData for each partitioning hierarchy (i.e., each sublist of
@@ -1875,16 +1879,24 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Note that the PartitionPruneContexts for both initial and exec pruning
+ * (which are stored in each PartitionedRelPruningData) are initialized lazily
+ * in find_matching_subplans_recurse() when used for the first time.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+
+ /*
+ * Expression context that will be used by partkey_datum_from_expr() to
+ * evaluate expressions for comparison against partition bounds.
+ */
+ ExprContext *econtext = CreateExprContext(estate);
/* For data reading, executor always includes detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1906,6 +1918,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->other_subplans = bms_copy(pruneinfo->other_subplans);
prunestate->do_initial_prune = false; /* may be set below */
prunestate->do_exec_prune = false; /* may be set below */
+ prunestate->parent_plan = NULL;
prunestate->num_partprunedata = n_part_hierarchies;
/*
@@ -1941,16 +1954,25 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
PartitionDesc partdesc;
- PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Used for initializing the expressions in initial pruning steps.
+ * For exec pruning steps, the parent plan node's PlanState's
+ * ps_ExprContext will be used.
*/
+ pprune->estate = estate;
+ pprune->econtext = econtext;
+
+ /* Remember Relation for use in InitPartitionPruneContext. */
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
- partkey = RelationGetPartitionKey(partrel);
+ pprune->partrel = partrel;
+
+ /*
+ * We can rely on the copy of partitioned table's partition
+ * descriptor appearing in its relcache entry, because that entry
+ * will be held open and locked for the duration of this executor
+ * run.
+ */
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
@@ -2061,32 +2083,26 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->present_parts = bms_copy(pinfo->present_parts);
/*
- * Initialize pruning contexts as needed. Note that we must skip
- * execution-time partition pruning in EXPLAIN (GENERIC_PLAN),
- * since parameter values may be missing.
+ * Pruning contexts (initial_context and exec_context) are
+ * initialized lazily in find_matching_subplans_recurse() when
+ * used for the first time.
+ *
+ * Note that we must skip execution-time partition pruning in
+ * EXPLAIN (GENERIC_PLAN), since parameter values may be missing.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
+ pprune->initial_context.initialized = false;
if (pinfo->initial_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->initial_context,
- pinfo->initial_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
- }
+
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ pprune->exec_context.initialized = false;
if (pinfo->exec_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
- }
/*
* Accumulate the IDs of all PARAM_EXEC Params affecting the
@@ -2107,17 +2123,41 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize a PartitionPruneContext for the given list of pruning steps.
*/
static void
-InitPartitionPruneContext(PartitionPruneContext *context,
+InitPartitionPruneContext(PartitionedRelPruningData *pprune,
+ PartitionPruneContext *context,
List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate,
- ExprContext *econtext)
+ PlanState *planstate)
{
int n_steps;
int partnatts;
ListCell *lc;
+ /*
+ * Use the ExprContext that CreatePartitionPruneState() should have
+ * created.
+ */
+ ExprContext *econtext = pprune->econtext;
+ EState *estate = pprune->estate;
+ MemoryContext oldcxt;
+ Relation partrel = pprune->partrel;
+ PartitionKey partkey;
+ PartitionDesc partdesc;
+
+ Assert(econtext != NULL);
+
+ /* Must allocate the needed stuff in the query lifetime context. */
+ oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+
+ /*
+ * We can rely on the copies of the partitioned table's partition key and
+ * partition descriptor appearing in its relcache entry, because that
+ * entry will be held open and locked for the duration of this executor
+ * run.
+ */
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ partrel);
+
n_steps = list_length(pruning_steps);
context->strategy = partkey->strategy;
@@ -2185,6 +2225,9 @@ InitPartitionPruneContext(PartitionPruneContext *context,
}
}
}
+
+ MemoryContextSwitchTo(oldcxt);
+ context->initialized = true;
}
/*
@@ -2348,12 +2391,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* recursing to other (lower-level) parents as needed.
*/
pprune = &prunedata->partrelprunedata[0];
- find_matching_subplans_recurse(prunedata, pprune, initial_prune,
+ find_matching_subplans_recurse(prunestate->parent_plan,
+ prunedata, pprune, initial_prune,
&result);
/* Expression eval may have used space in ExprContext too */
- if (pprune->exec_pruning_steps)
+ if (pprune->exec_context.initialized)
+ {
+ Assert(pprune->exec_pruning_steps != NIL);
ResetExprContext(pprune->exec_context.exprcontext);
+ }
}
/* Add in any subplans that partition pruning didn't account for */
@@ -2376,7 +2423,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* Adds valid (non-prunable) subplan IDs to *validsubplans
*/
static void
-find_matching_subplans_recurse(PartitionPruningData *prunedata,
+find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans)
@@ -2393,11 +2441,27 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
* level.
*/
if (initial_prune && pprune->initial_pruning_steps)
+ {
+ /* Initialize initial_context if not already done. */
+ if (unlikely(!pprune->initial_context.initialized))
+ InitPartitionPruneContext(pprune,
+ &pprune->initial_context,
+ pprune->initial_pruning_steps,
+ parent_plan);
partset = get_matching_partitions(&pprune->initial_context,
pprune->initial_pruning_steps);
+ }
else if (!initial_prune && pprune->exec_pruning_steps)
+ {
+ /* Initialize exec_context if not already done. */
+ if (unlikely(!pprune->exec_context.initialized))
+ InitPartitionPruneContext(pprune,
+ &pprune->exec_context,
+ pprune->exec_pruning_steps,
+ parent_plan);
partset = get_matching_partitions(&pprune->exec_context,
pprune->exec_pruning_steps);
+ }
else
partset = pprune->present_parts;
@@ -2413,7 +2477,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
int partidx = pprune->subpart_map[i];
if (partidx >= 0)
- find_matching_subplans_recurse(prunedata,
+ find_matching_subplans_recurse(parent_plan,
+ prunedata,
&prunedata->partrelprunedata[partidx],
initial_prune, validsubplans);
else
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index ca5467104d..ae1d69f96c 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -3783,13 +3783,8 @@ partkey_datum_from_expr(PartitionPruneContext *context,
/*
* We should never see a non-Const in a step unless the caller has
* passed a valid ExprContext.
- *
- * When context->planstate is valid, context->exprcontext is same as
- * context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL || context->exprcontext != NULL);
- Assert(context->planstate == NULL ||
- (context->exprcontext == context->planstate->ps_ExprContext));
+ Assert(context->exprcontext != NULL);
exprstate = context->exprstates[stateidx];
ectx = context->exprcontext;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 33d922fe8d..7e470c82f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -42,6 +42,10 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* PartitionedRelPruneInfo (see plannodes.h); though note that here,
* subpart_map contains indexes into PartitionPruningData.partrelprunedata[].
*
+ * estate The EState for the query doing run-time pruning
+ * partrel Partitioned table Relation; obtained by
+ * ExecGetRangeTableRelation(estate, rti), where
+ * rti is PartitionedRelPruneInfo.rtindex.
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
@@ -51,6 +55,8 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* perform executor startup pruning.
* exec_pruning_steps List of PartitionPruneSteps used to
* perform per-scan pruning.
+ * econtext ExprContext to use for evaluating partition
+ * key
* initial_context If initial_pruning_steps isn't NIL, contains
* the details needed to execute those steps.
* exec_context If exec_pruning_steps isn't NIL, contains
@@ -58,12 +64,15 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
*/
typedef struct PartitionedRelPruningData
{
+ EState *estate;
+ Relation partrel;
int nparts;
int *subplan_map;
int *subpart_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
+ ExprContext *econtext;
PartitionPruneContext initial_context;
PartitionPruneContext exec_context;
} PartitionedRelPruningData;
@@ -105,6 +114,8 @@ typedef struct PartitionPruningData
* startup (at any hierarchy level).
* do_exec_prune true if pruning should be performed during
* executor run (at any hierarchy level).
+ * parent_plan Parent plan node's PlanState used to initialize
+ * expression contained in "exec" pruning steps.
* num_partprunedata Number of items in "partprunedata" array.
* partprunedata Array of PartitionPruningData pointers for the plan's
* partitioned relation(s), one for each partitioning
@@ -117,6 +128,7 @@ typedef struct PartitionPruneState
MemoryContext prune_context;
bool do_initial_prune;
bool do_exec_prune;
+ PlanState *parent_plan;
int num_partprunedata;
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 6922e04430..0cbcb4fb4e 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -26,6 +26,7 @@ struct RelOptInfo;
* Stores information needed at runtime for pruning computations
* related to a single partitioned table.
*
+ * initialized Has the information in this struct been initialized?
* strategy Partition strategy, e.g. LIST, RANGE, HASH.
* partnatts Number of columns in the partition key.
* nparts Number of partitions in this partitioned table.
@@ -48,6 +49,7 @@ struct RelOptInfo;
*/
typedef struct PartitionPruneContext
{
+ bool initialized;
char strategy;
int partnatts;
int nparts;
--
2.43.0
[application/octet-stream] v59-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patch (20.4K, 3-v59-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patch)
download | inline diff:
From da516d9afbad17904f5377e7b1ca9a5c0300837e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 4 Dec 2024 16:16:29 +0900
Subject: [PATCH v59 1/5] Move PartitionPruneInfo out of plan nodes into
PlannedStmt
This change moves PartitionPruneInfo from individual plan nodes to
PlannedStmt, enabling runtime initial pruning to be performed across
the entire plan tree without traversing it to find nodes containing
PartitionPruneInfos.
The PartitionPruneInfo pointer fields in Append and MergeAppend nodes
have been replaced with an integer index that points to a list of
PartitionPruneInfos within PlannedStmt, which now holds the
PartitionPruneInfos for all subqueries.
A bitmapset field has been added to PartitionPruneInfo to store the RT
indexes that correspond to the apprelids field in Append or
MergeAppend. This ensures that the execution pruning logic
cross-checks that it operates on the correct plan node.
Duplicated code in set_append_references() and
set_mergeappend_references() has been moved to a new function,
register_pruneinfo(), which both updates the RT indexes by adding
rtoffset and adds the PartitionPruneInfo to the global list in
PlannerGlobal.
Reviewed-by: Alvaro Herrera
Reviewed-by: Robert Haas
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 17 ++++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/createplan.c | 23 +++----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 85 ++++++++++++++++---------
src/backend/partitioning/partprune.c | 19 ++++--
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 16 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 131 insertions(+), 61 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5ca856fd27..b40fe38178 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -856,6 +856,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..b01a2fdfdd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -181,6 +181,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..950fa3289c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1786,6 +1786,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'relids' identifies the relation to which both the parent plan and the
+ * PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1798,11 +1801,23 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo;
+
+ /* Obtain the pruneinfo we need. */
+ pruneinfo = list_nth_node(PartitionPruneInfo, estate->es_part_prune_infos,
+ part_prune_index);
+
+ /* Its relids better match the plan node's or the planner messed up. */
+ if (!bms_equal(relids, pruneinfo->relids))
+ elog(ERROR, "wrong pruneinfo with relids=%s found at part_prune_index=%d contained in plan node with relids=%s",
+ bmsToString(pruneinfo->relids), part_prune_index,
+ bmsToString(relids));
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 740e8fb148..bc905a0cdc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -118,6 +118,7 @@ CreateExecutorState(void)
estate->es_rowmarks = NULL;
estate->es_rteperminfos = NIL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..de7ebab5c2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3ed91808dd 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 178c572b02..8f209c2d2f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1227,7 +1227,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1378,6 +1377,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1401,16 +1403,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1449,7 +1449,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1542,6 +1541,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1557,13 +1559,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b665a7762e..9c253e864a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -557,6 +557,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6d23df108d..9f13243d54 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1731,6 +1731,47 @@ set_customscan_references(PlannerInfo *root,
cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
}
+/*
+ * register_partpruneinfo
+ * Subroutine for set_append_references and set_mergeappend_references
+ *
+ * Add the PartitionPruneInfo from root->partPruneInfos at the given index
+ * into PlannerGlobal->partPruneInfos and return its index there.
+ *
+ * Also update the RT indexes present in PartitionedRelPruneInfos to add the
+ * offset.
+ */
+static int
+register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
+{
+ PlannerGlobal *glob = root->glob;
+ PartitionPruneInfo *pinfo;
+ ListCell *l;
+
+ Assert(part_prune_index >= 0 &&
+ part_prune_index < list_length(root->partPruneInfos));
+ pinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+ part_prune_index);
+
+ pinfo->relids = offset_relid_set(pinfo->relids, rtoffset);
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pinfo);
+
+ return list_length(glob->partPruneInfos) - 1;
+}
+
/*
* set_append_references
* Do set_plan_references processing on an Append
@@ -1783,21 +1824,13 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index =
+ register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1859,21 +1892,13 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index =
+ register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 4e12ae5d1e..ca5467104d 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -207,16 +207,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -330,10 +334,11 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->relids = bms_copy(parentrel->relids);
pruneinfo->prune_infos = prunerelinfos;
/*
@@ -356,7 +361,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..33d922fe8d 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,7 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 182a6956bb..b1471e68fe 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -639,6 +639,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* List of PartitionPruneInfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45f..f8a4cd42c6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* "flat" list of PartitionPruneInfos */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -559,6 +562,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 52f29bcdb6..ef89927471 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in the
+ * plan */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
@@ -276,8 +279,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -311,8 +314,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1414,6 +1417,10 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * relids RelOptInfo.relids of the parent plan node (e.g. Append
+ * or MergeAppend) to which this PartitionPruneInfo node
+ * belongs. The pruning logic ensures that this matches
+ * the parent plan node's apprelids.
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1426,6 +1433,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal, no_query_jumble)
NodeTag type;
+ Bitmapset *relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..6922e04430 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
[application/octet-stream] v59-0003-Perform-runtime-initial-pruning-outside-ExecInit.patch (14.6K, 4-v59-0003-Perform-runtime-initial-pruning-outside-ExecInit.patch)
download | inline diff:
From 638c7297f63d530ce29370fffa20fcd34ee8b450 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 4 Dec 2024 16:16:49 +0900
Subject: [PATCH v59 3/5] Perform runtime initial pruning outside
ExecInitNode()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This commit follows up on the previous change that moved
PartitionPruneInfos out of individual plan nodes into a list in
PlannedStmt. It moves the initialization of PartitionPruneStates
and runtime initial pruning out of ExecInitNode() and into a new
routine, ExecDoInitialPruning(), which is called by InitPlan()
before ExecInitNode() is invoked on the main plan tree and subplans.
ExecDoInitialPruning() performs the initial pruning and saves the
result—a bitmapset of indexes for the surviving child subnodes—in
es_part_prune_results, a list in EState. The PartitionPruneStates
created for initial pruning are also saved in es_part_prune_states,
another list in EState, for later use during exec pruning. Both lists
are parallel to es_part_prune_infos (which holds the
PartitionPruneInfos from PlannedStmt), allowing them to share the
same index.
Reviewed-by: Robert Haas
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 12 +++
src/backend/executor/execPartition.c | 130 ++++++++++++++++++-------
src/backend/executor/nodeAppend.c | 10 +-
src/backend/executor/nodeMergeAppend.c | 10 +-
src/include/executor/execPartition.h | 11 ++-
src/include/nodes/execnodes.h | 2 +
6 files changed, 125 insertions(+), 50 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b40fe38178..5dc46f2e95 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -46,6 +46,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
@@ -858,6 +859,17 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ /*
+ * Perform runtime "initial" pruning to identify which child subplans,
+ * corresponding to the children of plan nodes that contain
+ * PartitionPruneInfo such as Append, will not be executed. The results,
+ * which are bitmapsets of indexes of the child subplans that will be
+ * executed, are saved in es_part_prune_results. These results correspond
+ * to each PartitionPruneInfo entry, and the es_part_prune_results list is
+ * parallel to es_part_prune_infos.
+ */
+ ExecDoInitialPruning(estate);
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f4d425cd45..46dd1c77a3 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1761,48 +1761,105 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
- * ExecInitPartitionPruning:
- * Creates the PartitionPruneState required by ExecFindMatchingSubPlans.
- * Details stored include how to map the partition index returned by the
- * partition pruning code into subplan indexes. Also determines the set
- * of subplans to initialize considering the result of performing initial
- * pruning steps if any. Maps in PartitionPruneState are updated to
+ * ExecDoInitialPruning:
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode() for
+ * all plan nodes that contain a PartitionPruneInfo.
+ *
+ * ExecInitPartitionExecPruning:
+ * Updates the PartitionPruneState found at given part_prune_index in
+ * EState.es_part_prune_states for use during "exec" pruning if required.
+ * Also returns the set of subplans to initialize that would be stored at
+ * part_prune_index in EState.es_part_prune_result by
+ * ExecDoInitialPruning(). Maps in PartitionPruneState are updated to
* account for initial pruning possibly having eliminated some of the
* subplans.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
- * called during ExecInitPartitionPruning() to find the initially
- * matching subplans based on performing the initial pruning steps and
- * then must be called again each time the value of a Param listed in
+ * called during ExecDoInitialPruning() to find the initially matching
+ * subplans based on performing the initial pruning steps and then must be
+ * called again each time the value of a Param listed in
* PartitionPruneState's 'execparamids' changes.
*-------------------------------------------------------------------------
*/
/*
- * ExecInitPartitionPruning
- * Initialize data structure needed for run-time partition pruning and
- * do initial pruning if needed
+ * ExecDoInitialPruning
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode() for
+ * plan nodes that support partition pruning.
+ *
+ * This function iterates over each PartitionPruneInfo entry in
+ * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
+ * and adds it to es_part_prune_states, where ExecInitPartitionExecPruning() can
+ * access it for use during "exec" pruning.
+ *
+ * If initial pruning steps exist for a PartitionPruneInfo entry, this function
+ * executes those pruning steps and stores the result as a bitmapset of valid
+ * child subplans, identifying which subplans should be initialized for
+ * execution. The results are saved in estate->es_part_prune_results.
+ *
+ * If no initial pruning is performed for a given PartitionPruneInfo, a NULL
+ * entry is still added to es_part_prune_results to maintain alignment with
+ * es_part_prune_infos. This ensures that ExecInitPartitionExecPruning() can
+ * use the same index to retrieve the pruning results.
+ */
+void
+ExecDoInitialPruning(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans = NULL;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+
+ /*
+ * Perform initial pruning steps, if any, and save the result
+ * bitmapset or NULL as described in the header comment.
+ */
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ validsubplans);
+ }
+}
+
+/*
+ * ExecInitPartitionExecPruning
+ * Initialize the data structures needed for runtime "exec" partition
+ * pruning and return the result of initial pruning, if available.
*
* 'relids' identifies the relation to which both the parent plan and the
* PartitionPruneInfo given by 'part_prune_index' belong.
*
- * On return, *initially_valid_subplans is assigned the set of indexes of
- * child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * The PartitionPruneState would have been created by ExecDoInitialPruning()
+ * and stored as the part_prune_index'th element of EState.es_part_prune_states.
*
- * If subplans are indeed pruned, subplan_map arrays contained in the returned
- * PartitionPruneState are re-sequenced to not count those, though only if the
- * maps will be needed for subsequent execution pruning passes.
+ * On return, *initially_valid_subplans is assigned the set of indexes of child
+ * subplans that must be initialized. Initial pruning would have been performed
+ * by ExecDoInitialPruning() if necessary, and the bitmapset of surviving
+ * subplans' indexes would have been stored as the part_prune_index'th element
+ * of EState.es_part_prune_results.
+ *
+ * If subplans were pruned during initial pruning, the subplan_map arrays in
+ * the returned PartitionPruneState are re-sequenced to exclude those subplans,
+ * but only if the maps will be needed for subsequent execution pruning passes.
*/
PartitionPruneState *
-ExecInitPartitionPruning(PlanState *planstate,
- int n_total_subplans,
- int part_prune_index,
- Bitmapset *relids,
- Bitmapset **initially_valid_subplans)
+ExecInitPartitionExecPruning(PlanState *planstate,
+ int n_total_subplans,
+ int part_prune_index,
+ Bitmapset *relids,
+ Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
@@ -1818,11 +1875,12 @@ ExecInitPartitionPruning(PlanState *planstate,
bmsToString(pruneinfo->relids), part_prune_index,
bmsToString(relids));
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
-
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ /*
+ * ExecDoInitialPruning() must have initialized the PartitionPruneState to
+ * perform the initial pruning.
+ */
+ prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
+ Assert(prunestate != NULL);
/*
* Store PlanState for using it to initialize exec pruning contexts later
@@ -1831,11 +1889,11 @@ ExecInitPartitionPruning(PlanState *planstate,
if (prunestate->do_exec_prune)
prunestate->parent_plan = planstate;
- /*
- * Perform an initial partition prune pass, if required.
- */
+ /* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = list_nth_node(Bitmapset,
+ estate->es_part_prune_results,
+ part_prune_index);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1846,8 +1904,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/*
* Re-sequence subplan indexes contained in prunestate to account for any
- * that were removed above due to initial pruning. No need to do this if
- * no steps were removed.
+ * that were removed due to initial pruning. No need to do this if no
+ * partitions were removed.
*/
if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
@@ -1868,6 +1926,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
+ * Details stored include how to map the partition index returned by the
+ * partition pruning code into subplan indexes.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index de7ebab5c2..b77ff84840 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -143,11 +143,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans to initialize (validsubplans) by taking into account the
* result of performing initial pruning if any.
*/
- prunestate = ExecInitPartitionPruning(&appendstate->ps,
- list_length(node->appendplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
+ prunestate = ExecInitPartitionExecPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_index,
+ node->apprelids,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3ed91808dd..e2032afcb7 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -91,11 +91,11 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans to initialize (validsubplans) by taking into account the
* result of performing initial pruning if any.
*/
- prunestate = ExecInitPartitionPruning(&mergestate->ps,
- list_length(node->mergeplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
+ prunestate = ExecInitPartitionExecPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_index,
+ node->apprelids,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 7e470c82f6..0b34784922 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -133,11 +133,12 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
-extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
- int n_total_subplans,
- int part_prune_index,
- Bitmapset *relids,
- Bitmapset **initially_valid_subplans);
+void ExecDoInitialPruning(EState *estate);
+extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
+ int n_total_subplans,
+ int part_prune_index,
+ Bitmapset *relids,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b1471e68fe..f93061c7bf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -640,6 +640,8 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
+ List *es_part_prune_states; /* List of PartitionPruneState */
+ List *es_part_prune_results; /* List of Bitmapset */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
--
2.43.0
[application/octet-stream] v59-0005-Remove-the-need-to-check-if-plan-is-valid-from-E.patch (24.4K, 5-v59-0005-Remove-the-need-to-check-if-plan-is-valid-from-E.patch)
download | inline diff:
From e9eb720d23c56f8b950b96128bc52b690711dd65 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 9 Dec 2024 12:34:04 +0900
Subject: [PATCH v59 5/5] Remove the need to check if plan is valid from
ExecutorStart hooks
For testing using delay_execution, a new hook
ExecutotStartCachedPlan_hook is added. This hook allows the
delay_execution module to block the execution of the cached plan to
allow a concurrent session to modify the objects referenced in the
plan, which is then detected when the locks are taken on prunable
relations in ExecutorStart().
---
contrib/auto_explain/auto_explain.c | 4 -
.../pg_stat_statements/pg_stat_statements.c | 4 -
src/backend/executor/execMain.c | 257 ++++++++++--------
src/backend/executor/execPartition.c | 7 +-
src/include/executor/execPartition.h | 3 +-
src/include/executor/executor.h | 34 +--
src/include/nodes/execnodes.h | 1 -
.../modules/delay_execution/delay_execution.c | 20 +-
.../expected/cached-plan-inval.out | 26 +-
9 files changed, 179 insertions(+), 177 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 8b5eaf3ef3..623a674f99 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -298,10 +298,6 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index b11691ae26..49c657b3e0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -994,10 +994,6 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2dfa7eff54..b617a9ca8d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -68,6 +68,7 @@
/* Hooks for plugins to get control in ExecutorStart/Run/Finish/End */
ExecutorStart_hook_type ExecutorStart_hook = NULL;
+ExecutorStartCachedPlan_hook_type ExecutorStartCachedPlan_hook = NULL;
ExecutorRun_hook_type ExecutorRun_hook = NULL;
ExecutorFinish_hook_type ExecutorFinish_hook = NULL;
ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
@@ -123,6 +124,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ EState *estate;
+ MemoryContext oldcontext;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *rangeTable = plannedstmt->rtable;
+ CachedPlan *cachedplan = queryDesc->cplan;
+
+ /* sanity checks: queryDesc must not be started already */
+ Assert(queryDesc != NULL);
+ Assert(queryDesc->estate == NULL);
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -133,6 +144,117 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+ /*
+ * Build EState, switch into per-query memory context for startup.
+ */
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+ /*
+ * Fill in external parameters, if any, from queryDesc; and allocate
+ * workspace for internal parameters
+ */
+ estate->es_param_list_info = queryDesc->params;
+
+ if (queryDesc->plannedstmt->paramExecTypes != NIL)
+ {
+ int nParamExec;
+
+ nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
+ estate->es_param_exec_vals = (ParamExecData *)
+ palloc0(nParamExec * sizeof(ParamExecData));
+ }
+
+ /* We now require all callers to provide sourceText */
+ Assert(queryDesc->sourceText != NULL);
+ estate->es_sourceText = queryDesc->sourceText;
+
+ /*
+ * Fill in the query environment, if any, from queryDesc.
+ */
+ estate->es_queryEnv = queryDesc->queryEnv;
+
+ /*
+ * If non-read-only query, set the command ID to mark output tuples with
+ */
+ switch (queryDesc->operation)
+ {
+ case CMD_SELECT:
+
+ /*
+ * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
+ * tuples
+ */
+ if (queryDesc->plannedstmt->rowMarks != NIL ||
+ queryDesc->plannedstmt->hasModifyingCTE)
+ estate->es_output_cid = GetCurrentCommandId(true);
+
+ /*
+ * A SELECT without modifying CTEs can't possibly queue triggers,
+ * so force skip-triggers mode. This is just a marginal efficiency
+ * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
+ * all that expensive, but we might as well do it.
+ */
+ if (!queryDesc->plannedstmt->hasModifyingCTE)
+ eflags |= EXEC_FLAG_SKIP_TRIGGERS;
+ break;
+
+ case CMD_INSERT:
+ case CMD_DELETE:
+ case CMD_UPDATE:
+ case CMD_MERGE:
+ estate->es_output_cid = GetCurrentCommandId(true);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized operation code: %d",
+ (int) queryDesc->operation);
+ break;
+ }
+
+ /*
+ * Copy other important information into the EState
+ */
+ estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
+ estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
+ estate->es_top_eflags = eflags;
+ estate->es_instrument = queryDesc->instrument_options;
+ estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
+
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+
+ /*
+ * initialize the node's execution state
+ */
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
+ bms_copy(plannedstmt->unprunableRelids));
+
+ /*
+ * Perform runtime "initial" pruning to identify which child subplans,
+ * corresponding to the children of plan nodes that contain
+ * PartitionPruneInfo such as Append, will not be executed. The results,
+ * which are bitmapsets of indexes of the child subplans that will be
+ * executed, are saved in es_part_prune_results. These results correspond
+ * to each PartitionPruneInfo entry, and the es_part_prune_results list is
+ * parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate, cachedplan);
+
+ MemoryContextSwitchTo(oldcontext);
+
+ if (cachedplan && !CachedPlanValid(cachedplan))
+ return;
+
if (ExecutorStart_hook)
(*ExecutorStart_hook) (queryDesc, eflags);
else
@@ -159,6 +281,20 @@ void
ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
CachedPlanSource *plansource,
int query_index)
+{
+ if (ExecutorStartCachedPlan_hook)
+ (*ExecutorStartCachedPlan_hook) (queryDesc, eflags, plansource,
+ query_index);
+ else
+ standard_ExecutorStartCachedPlan(queryDesc, eflags, plansource,
+ query_index);
+
+}
+
+void
+standard_ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
{
if (unlikely(queryDesc->cplan == NULL))
elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
@@ -198,12 +334,12 @@ ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- EState *estate;
MemoryContext oldcontext;
+ EState *estate;
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
+ Assert(queryDesc->estate != NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -227,85 +363,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
ExecCheckXactReadOnly(queryDesc->plannedstmt);
- /*
- * Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
-
- oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
-
- /*
- * Fill in external parameters, if any, from queryDesc; and allocate
- * workspace for internal parameters
- */
- estate->es_param_list_info = queryDesc->params;
-
- if (queryDesc->plannedstmt->paramExecTypes != NIL)
- {
- int nParamExec;
-
- nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
- estate->es_param_exec_vals = (ParamExecData *)
- palloc0(nParamExec * sizeof(ParamExecData));
- }
-
- /* We now require all callers to provide sourceText */
- Assert(queryDesc->sourceText != NULL);
- estate->es_sourceText = queryDesc->sourceText;
-
- /*
- * Fill in the query environment, if any, from queryDesc.
- */
- estate->es_queryEnv = queryDesc->queryEnv;
-
- /*
- * If non-read-only query, set the command ID to mark output tuples with
- */
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
-
- /*
- * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
- * tuples
- */
- if (queryDesc->plannedstmt->rowMarks != NIL ||
- queryDesc->plannedstmt->hasModifyingCTE)
- estate->es_output_cid = GetCurrentCommandId(true);
-
- /*
- * A SELECT without modifying CTEs can't possibly queue triggers,
- * so force skip-triggers mode. This is just a marginal efficiency
- * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
- * all that expensive, but we might as well do it.
- */
- if (!queryDesc->plannedstmt->hasModifyingCTE)
- eflags |= EXEC_FLAG_SKIP_TRIGGERS;
- break;
-
- case CMD_INSERT:
- case CMD_DELETE:
- case CMD_UPDATE:
- case CMD_MERGE:
- estate->es_output_cid = GetCurrentCommandId(true);
- break;
-
- default:
- elog(ERROR, "unrecognized operation code: %d",
- (int) queryDesc->operation);
- break;
- }
-
- /*
- * Copy other important information into the EState
- */
- estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
- estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
- estate->es_top_eflags = eflags;
- estate->es_instrument = queryDesc->instrument_options;
- estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
-
/*
* Set up an AFTER-trigger statement context, unless told not to, or
* unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
@@ -313,6 +370,9 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
AfterTriggerBeginQuery();
+ estate = queryDesc->estate;
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
/*
* Initialize the plan state tree
*/
@@ -928,46 +988,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
- CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = cachedplan;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- *
- * This will also add the RT indexes of surviving leaf partitions to
- * es_unpruned_relids.
- */
- ExecDoInitialPruning(estate);
-
- if (!ExecPlanStillValid(estate))
- return;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -2961,9 +2989,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
- *
- * es_cachedplan is not copied because EPQ plan execution does not acquire
- * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ac6bc4326b..72f5973754 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1813,7 +1813,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* use the same index to retrieve the pruning results.
*/
void
-ExecDoInitialPruning(EState *estate)
+ExecDoInitialPruning(EState *estate, CachedPlan *cplan)
{
ListCell *lc;
List *locked_relids = NIL;
@@ -1842,7 +1842,7 @@ ExecDoInitialPruning(EState *estate)
else
validsubplan_rtis = all_leafpart_rtis;
- if (ExecShouldLockRelations(estate))
+ if (cplan && CachedPlanRequiresLocking(cplan))
{
int rtindex = -1;
@@ -1857,6 +1857,7 @@ ExecDoInitialPruning(EState *estate)
locked_relids = lappend_int(locked_relids, rtindex);
}
}
+
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
@@ -1867,7 +1868,7 @@ ExecDoInitialPruning(EState *estate)
* Release the useless locks if the plan won't be executed. This is the
* same as what CheckCachedPlan() in plancache.c does.
*/
- if (!ExecPlanStillValid(estate))
+ if (cplan && !CachedPlanValid(cplan))
{
foreach(lc, locked_relids)
{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index a0843481f7..95d886884c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -17,6 +17,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "partitioning/partprune.h"
+#include "utils/plancache.h"
/* See execPartition.c for the definitions. */
typedef struct PartitionDispatchData *PartitionDispatch;
@@ -136,7 +137,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
-void ExecDoInitialPruning(EState *estate);
+void ExecDoInitialPruning(EState *estate, CachedPlan *cplan);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 02584dd154..b07942a457 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -76,6 +76,12 @@
typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
+/* Hook for plugins to get control in ExecutorStartCachedPlan() */
+typedef void (*ExecutorStartCachedPlan_hook_type) (QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
+extern PGDLLIMPORT ExecutorStartCachedPlan_hook_type ExecutorStartCachedPlan_hook;
+
/* Hook for plugins to get control in ExecutorRun() */
typedef void (*ExecutorRun_hook_type) (QueryDesc *queryDesc,
ScanDirection direction,
@@ -203,6 +209,9 @@ extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
CachedPlanSource *plansource,
int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void standard_ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -265,31 +274,6 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
-/*
- * Is the CachedPlan in es_cachedplan still valid?
- *
- * Called from InitPlan() because invalidation messages that affect the plan
- * might be received after locks have been taken on runtime-prunable relations.
- * The caller should take appropriate action if the plan has become invalid.
- */
-static inline bool
-ExecPlanStillValid(EState *estate)
-{
- return estate->es_cachedplan == NULL ? true :
- CachedPlanValid(estate->es_cachedplan);
-}
-
-/*
- * Locks are needed only if running a cached plan that might contain unlocked
- * relations, such as a reused generic plan.
- */
-static inline bool
-ExecShouldLockRelations(EState *estate)
-{
- return estate->es_cachedplan == NULL ? false :
- CachedPlanRequiresLocking(estate->es_cachedplan);
-}
-
/* ----------------------------------------------------------------
* ExecProcNode
*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9643a9d626..45a9e3e5a5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -640,7 +640,6 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
- CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 44aa828fdf..a3bfaed372 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -41,7 +41,7 @@ static int executor_start_lock_id = 0;
/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
-static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
+static ExecutorStartCachedPlan_hook_type prev_ExecutorStartCachedPlan_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -79,7 +79,9 @@ delay_execution_planner(Query *parse, const char *query_string,
/* ExecutorStart_hook function to provide the desired delay */
static void
-delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+delay_execution_ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
{
/* If enabled, delay by taking and releasing the specified lock */
if (executor_start_lock_id != 0)
@@ -97,13 +99,15 @@ delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
/* Now start the executor, possibly via a previous hook user */
- if (prev_ExecutorStart_hook)
- prev_ExecutorStart_hook(queryDesc, eflags);
+ if (prev_ExecutorStartCachedPlan_hook)
+ prev_ExecutorStartCachedPlan_hook(queryDesc, eflags, plansource,
+ query_index);
else
- standard_ExecutorStart(queryDesc, eflags);
+ standard_ExecutorStartCachedPlan(queryDesc, eflags, plansource,
+ query_index);
if (executor_start_lock_id != 0)
- elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ elog(NOTICE, "Finished ExecutorStartCachedPlan(): CachedPlan is %s",
CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
}
@@ -139,6 +143,6 @@ _PG_init(void)
/* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
- prev_ExecutorStart_hook = ExecutorStart_hook;
- ExecutorStart_hook = delay_execution_ExecutorStart;
+ prev_ExecutorStartCachedPlan_hook = ExecutorStartCachedPlan_hook;
+ ExecutorStartCachedPlan_hook = delay_execution_ExecutorStartCachedPlan;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
index 5bfb2b33b3..165f865b7a 100644
--- a/src/test/modules/delay_execution/expected/cached-plan-inval.out
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -32,8 +32,7 @@ t
(1 row)
step s1exec: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
-------------------------------------
LockRows
@@ -48,7 +47,7 @@ starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
step s1prep2: SET plan_cache_mode = force_generic_plan;
PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
EXPLAIN (COSTS OFF) EXECUTE q2;
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
--------------------------------------------------
Append
@@ -81,8 +80,7 @@ t
(1 row)
step s1exec2: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
--------------------------------------------
Append
@@ -98,9 +96,9 @@ starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
step s1prep3: SET plan_cache_mode = force_generic_plan;
PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
EXPLAIN (COSTS OFF) EXECUTE q3;
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
--------------------------------------------------------------
Nested Loop
@@ -178,10 +176,9 @@ t
(1 row)
step s1exec3: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
-------------------------------------------------------------
Nested Loop
@@ -233,7 +230,7 @@ step s1prep4: SET plan_cache_mode = force_generic_plan;
SET enable_seqscan TO off;
PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
EXPLAIN (COSTS OFF) EXECUTE q4 (1);
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
---------------------------------------------------------------
Result
@@ -264,8 +261,7 @@ t
(1 row)
step s1exec4: <... completed>
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
-s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStartCachedPlan(): CachedPlan is valid
QUERY PLAN
---------------------------------------------
Result
--
2.43.0
[application/octet-stream] v59-0004-Defer-locking-of-runtime-prunable-relations-in-c.patch (119.3K, 6-v59-0004-Defer-locking-of-runtime-prunable-relations-in-c.patch)
download | inline diff:
From be5908f8334da0a6eb639ff02e34ee5383571f4f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 4 Dec 2024 16:16:56 +0900
Subject: [PATCH v59 4/5] Defer locking of runtime-prunable relations in cached
plans
AcquireExecutorLocks() in plancache.c locks all relations in a
plan's range table to ensure the plan is safe for execution. However,
this approach also locks runtime-prunable relations that will later
be pruned during "initial" runtime pruning, introducing unnecessary
overhead. This commit defers locking for such relations and ensures
that any invalidation caused by this deferral is handled by
replanning when necessary.
* Locking changes:
The planner now tracks "unprunable" relations using the new
PlannedStmt.unprunableRelids field, which is computed during
set_plan_refs() by subtracting runtime-prunable relation RT indexes
(identified from PartitionPruneInfos) from all RT indexes.
AcquireExecutorLocks() locks only these unprunable relations.
During executor startup, ExecDoInitialPruning() identifies unpruned
partitions and acquires locks on them. A new es_unpruned_relids field
is added to EState to ensure that subsequent initialization steps
process only locked relations. It is initially populated with
PlannedStmt.unprunableRelids and updated by ExecDoInitialPruning()
with the RT indexes of the unpruned partitions. To populate
es_unpruned_relids, PartitionedRelPruneInfo and
PartitionedRelPruningData now include a leafpart_rti_map[] to map
partition indexes (as determined by get_matching_partitions()) to
their corresponding RT indexes.
Executor code that works with child result relations and child
RowMarks require adjustments because pruned relations are no longer
locked, because without such adjustments, the executor could attempt
to process result relations or RowMarks for pruned partitions.
Specifically, ExecInitModifyTable() trims result relation-related
lists resultRelations, withCheckOptionLists, returningLists, and
updateColnosLists to include only unpruned partitions by checking
es_pruned_relids. It also creates ResultRelInfo structs only for
these unpruned partitions. Similarly, child RowMarks whose owning
relations are pruned are now ignored, again by checking
es_unpruned_relids, ensuring only those associated with unpruned
relations are processed.
Finally, ExecCheckPermissions() now includes an Assert to verify that
all relations undergoing permission checks have been properly locked.
This safeguard helps catch any cases where relations that should have
been added to the unprunableRelids set were missed during planning.
* Changed related to handling plan invalidation:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. To ensure
correctness, a new ExecutorStartCachedPlan() function that wraps
ExecutorStart() is added to detect and handle invalid plans caused by
deferred locking. When invalidation occurs, ExecutorStartCachedPlan()
updates all plans in the CachedPlan using the new UpdateCachedPlan()
function and retries execution with the refreshed plan.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. To
enable this, a new CachedPlan.stmt_context is introduced as a child
context of CachedPlan.context. This separates PlannedStmts from the
parent context, allowing UpdateCachedPlan() to free old PlannedStmts
when replacing them with new plans, while preserving the CachedPlan
structure, including the List containing the statements.
* Testing
Tests using the delay_execution module verify scenarios where a cached
plan becomes invalid due to changes in prunable relations after
deferred locks are taken.
* Note to extension authors:
ExecutorStart_hook implementations should verify plan validity after
calling standard_ExecutorStart() to ensure they are not working with
an invalid plan. The following check can be used:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Additionally, any RT index inspected by an extension should be
checked against EState.es_unpruned_relids before processing the
relation, particularly if the relation could be a child relation
subject to initial partition pruning. This is necessary because
extensions can no longer assume that all range table relations are
locked; only those in es_unpruned_relids are. For reference, see
how InitPlan() processes entries from PlannedStmt.rowMarks.
Reviewed-by: Robert Haas
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
src/backend/commands/copyfrom.c | 3 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 16 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/commands/trigger.c | 14 +
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 132 +++++++-
src/backend/executor/execParallel.c | 10 +-
src/backend/executor/execPartition.c | 120 +++++++-
src/backend/executor/execUtils.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 8 +-
src/backend/executor/nodeLockRows.c | 9 +-
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/nodeModifyTable.c | 70 ++++-
src/backend/executor/spi.c | 23 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 29 +-
src/backend/partitioning/partprune.c | 22 ++
src/backend/replication/logical/worker.c | 3 +-
src/backend/replication/pgoutput/pgoutput.c | 3 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 39 ++-
src/backend/utils/cache/plancache.c | 204 +++++++++++--
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 6 +-
src/include/commands/trigger.h | 1 +
src/include/executor/execPartition.h | 6 +-
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 31 +-
src/include/nodes/execnodes.h | 13 +
src/include/nodes/pathnodes.h | 8 +
src/include/nodes/plannodes.h | 7 +
src/include/utils/plancache.h | 50 +++-
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-inval.out | 282 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 80 +++++
src/test/regress/expected/partition_prune.out | 44 +++
src/test/regress/sql/partition_prune.sql | 18 ++
48 files changed, 1299 insertions(+), 117 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 623a674f99..8b5eaf3ef3 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -298,6 +298,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..b11691ae26 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -994,6 +994,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 4d52c93c30..da3d73d9b2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -768,7 +768,8 @@ CopyFrom(CopyFromState cstate)
* index-entry-making machinery. (There used to be a huge amount of code
* here that basically duplicated execUtils.c ...)
*/
- ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos);
+ ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos,
+ bms_make_singleton(1));
resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, resultRelInfo, 1);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..27b6f6f069 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -556,7 +556,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 5c92e48a56..0cc74dd45a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -332,7 +332,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a3f1d53d7a..b5c734e75c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -512,7 +512,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -634,7 +635,9 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -690,7 +693,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -704,8 +707,11 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Prepare the plan for execution. */
+ if (queryDesc->cplan)
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ else
+ ExecutorStart(queryDesc, eflags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index af6bd8ff42..7d4a3c5b8d 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -907,6 +907,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 010097873d..69be74b4bd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ac52ca25e9..48cf0b84e5 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -117,6 +117,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index a93f970a29..45fd63d2b1 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -582,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int query_index = 0;
if (es->memory)
{
@@ -654,7 +656,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -665,6 +668,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ query_index++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 09356e46d1..79572ec8f1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5123,6 +5123,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..449c6068ae 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+flag. If the plan tree is outdated (is_valid = false), the executor stops
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartCachedPlan
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartCachedPlan(), which will create a new plan
+tree and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5dc46f2e95..2dfa7eff54 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,11 +55,13 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -137,6 +139,62 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * ExecutorStartCachedPlan
+ * Start execution for a given query in the CachedPlanSource, replanning
+ * if the plan is invalidated due to deferred locks taken during the
+ * plan's initialization
+ *
+ * This function handles cases where the CachedPlan given in queryDesc->cplan
+ * might become invalid during the initialization of the plan given in
+ * queryDesc->plannedstmt, particularly when prunable relations in it are
+ * locked after performing initial pruning. If the locks invalidate the plan,
+ * the function calls UpdateCachedPlan() to replan all queries in the
+ * CachedPlan, and then retries initialization.
+ *
+ * The function repeats the process until ExecutorStart() successfully
+ * initializes the plan, that is without the CachedPlan becoming invalid.
+ */
+void
+ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (unlikely(queryDesc->cplan == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
+ if (unlikely(plansource == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlanSource");
+
+ /*
+ * Loop and retry with an updated plan until no further invalidation
+ * occurs.
+ */
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanValid(queryDesc->cplan))
+ {
+ /*
+ * Clean up the current execution state before creating the new
+ * plan to retry ExecutorStart(). Mark execution as aborted to
+ * ensure that AFTER trigger state is properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ /* Retry ExecutorStart() with an updated plan tree. */
+ queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+ }
+ else
+ /*
+ * Exit the loop if the plan is initialized successfully and no
+ * sinval messages were received that invalidated the CachedPlan.
+ */
+ break;
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -320,6 +378,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -426,8 +485,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -490,11 +552,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
(PgStat_Counter) estate->es_parallel_workers_launched);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -508,6 +569,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -606,6 +675,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -831,6 +915,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If the plan originates from a CachedPlan (given in queryDesc->cplan),
+ * it can become invalid during runtime "initial" pruning when the
+ * remaining set of locks is taken. The function returns early in that
+ * case without initializing the plan, and the caller is expected to
+ * retry with a new valid plan.
* ----------------------------------------------------------------
*/
static void
@@ -838,6 +928,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -854,9 +945,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
/*
* initialize the node's execution state
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
+ bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -867,9 +960,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* executed, are saved in es_part_prune_results. These results correspond
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
*/
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -884,8 +983,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Relation relation;
ExecRowMark *erm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* get relation's OID (will produce InvalidOid if subquery) */
@@ -2857,6 +2961,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2928,6 +3035,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
}
}
+ /*
+ * Copy es_unpruned_relids so that RowMarks of pruned relations are
+ * ignored in ExecInitLockRows() and ExecInitModifyTable() when
+ * initializing the plan trees below.
+ */
+ rcestate->es_unpruned_relids = parentestate->es_unpruned_relids;
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b01a2fdfdd..08492166d4 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
+ pstmt->unprunableRelids = estate->es_unpruned_relids;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -1257,8 +1258,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but parallel workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 46dd1c77a3..ac6bc4326b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -182,7 +183,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis);
static void InitPartitionPruneContext(PartitionedRelPruningData *pprune,
PartitionPruneContext *context,
List *pruning_steps,
@@ -194,7 +196,8 @@ static void find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis);
/*
@@ -1764,7 +1767,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * all plan nodes that contain a PartitionPruneInfo.
+ * all plan nodes that contain a PartitionPruneInfo. This also locks the
+ * leaf partitions whose subnodes will be initialized if needed.
*
* ExecInitPartitionExecPruning:
* Updates the PartitionPruneState found at given part_prune_index in
@@ -1785,11 +1789,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * plan nodes that support partition pruning.
+ * plan nodes that support partition pruning. This also locks the leaf
+ * partitions whose subnodes will be initialized if needed.
*
* This function iterates over each PartitionPruneInfo entry in
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
@@ -1810,15 +1816,19 @@ void
ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
+ Bitmapset *all_leafpart_rtis = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo,
+ &all_leafpart_rtis);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
@@ -1827,10 +1837,45 @@ ExecDoInitialPruning(EState *estate)
* bitmapset or NULL as described in the header comment.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ else
+ validsubplan_rtis = all_leafpart_rtis;
+
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ locked_relids = lappend_int(locked_relids, rtindex);
+ }
+ }
+ estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
+ validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
+
+ /*
+ * Release the useless locks if the plan won't be executed. This is the
+ * same as what CheckCachedPlan() in plancache.c does.
+ */
+ if (!ExecPlanStillValid(estate))
+ {
+ foreach(lc, locked_relids)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
/*
@@ -1940,12 +1985,18 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
*
+ * On return, *all_leafpart_rtis will contain the RT indexes of all leaf
+ * partitions if initial pruning steps are skipped (e.g., during EXPLAIN
+ * (GENERIC_PLAN)). The caller is responsible for adding these RT indexes
+ * to estate->es_unpruned_relids.
+ *
* Note that the PartitionPruneContexts for both initial and exec pruning
* (which are stored in each PartitionedRelPruningData) are initialized lazily
* in find_matching_subplans_recurse() when used for the first time.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2042,8 +2093,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* The set of partitions that exist now might not be the same that
* existed when the plan was made. The normal case is that it is;
* optimize for that case with a quick comparison, and just copy
- * the subplan_map and make subpart_map point to the one in
- * PruneInfo.
+ * the subplan_map and make subpart_map, leafpart_rti_map point to
+ * the ones in PruneInfo.
*
* For the case where they aren't identical, we could have more
* partitions on either side; or even exactly the same number of
@@ -2062,6 +2113,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
sizeof(int) * partdesc->nparts) == 0)
{
pprune->subpart_map = pinfo->subpart_map;
+ pprune->leafpart_rti_map = pinfo->leafpart_rti_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
}
@@ -2082,6 +2134,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* mismatches.
*/
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->leafpart_rti_map = palloc(sizeof(int) * partdesc->nparts);
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
@@ -2099,6 +2152,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->leafpart_rti_map[pp_idx] =
+ pinfo->leafpart_rti_map[pd_idx];
pd_idx++;
continue;
}
@@ -2136,6 +2191,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map[pp_idx] = -1;
pprune->subplan_map[pp_idx] = -1;
+ pprune->leafpart_rti_map[pp_idx] = 0;
}
}
@@ -2171,6 +2227,25 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
prunestate->execparamids = bms_add_members(prunestate->execparamids,
pinfo->execparamids);
+ /*
+ * Return all leaf partition indexes if we're skipping pruning
+ * in the EXPLAIN (GENERIC_PLAN) case.
+ */
+ if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
+ {
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
+ rtindex);
+ }
+ }
+
j++;
}
i++;
@@ -2414,10 +2489,15 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * The caller must pass a non-NULL validsubplan_rtis during initial pruning
+ * to collect the RT indexes of leaf partitions whose subnodes will be
+ * executed. These RT indexes are later added to EState.es_unpruned_relids.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2429,6 +2509,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* evaluated *and* there are steps in which to do so.
*/
Assert(initial_prune || prunestate->do_exec_prune);
+ Assert(validsubplan_rtis != NULL || !initial_prune);
/*
* Switch to a temp context to avoid leaking memory in the executor's
@@ -2453,7 +2534,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunestate->parent_plan,
prunedata, pprune, initial_prune,
- &result);
+ &result, validsubplan_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_context.initialized)
@@ -2470,6 +2551,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_copy(*validsubplan_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2480,14 +2563,17 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
+ * of their corresponding leaf partitions to *validsubplan_rtis if
+ * it's non-NULL.
*/
static void
find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *partset;
int i;
@@ -2530,8 +2616,13 @@ find_matching_subplans_recurse(PlanState *parent_plan,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_add_member(*validsubplan_rtis,
+ pprune->leafpart_rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2540,7 +2631,8 @@ find_matching_subplans_recurse(PlanState *parent_plan,
find_matching_subplans_recurse(parent_plan,
prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ validsubplan_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index bc905a0cdc..e855880fbc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -728,7 +729,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
* indexed by rangetable index.
*/
void
-ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
+ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids)
{
/* Remember the range table List as-is */
estate->es_range_table = rangeTable;
@@ -739,6 +741,14 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
/* Set size of associated arrays */
estate->es_range_table_size = list_length(rangeTable);
+ /*
+ * Initialize the bitmapset of RT indexes (es_unpruned_relids) representing
+ * relations that will be scanned during execution. This set is initially
+ * populated by the caller and may be extended later by ExecDoInitialPruning()
+ * to include RT indexes of unpruned leaf partitions.
+ */
+ estate->es_unpruned_relids = unpruned_relids;
+
/*
* Allocate an array to store an open Relation corresponding to each
* rangetable entry, and initialize entries to NULL. Relations are opened
@@ -760,6 +770,10 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed in ExecEndPlan().
+ *
+ * Note: The caller must ensure that 'rti' refers to an unpruned relation
+ * (i.e., it is a member of estate->es_unpruned_relids) before calling this
+ * function. Attempting to open a pruned relation will result in an error.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -768,6 +782,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ if (!bms_is_member(rti, estate->es_unpruned_relids))
+ elog(ERROR, "trying to open a pruned relation");
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 8d1fda2ddc..058c10b4d4 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index b77ff84840..89e05b19d0 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -581,7 +581,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -648,7 +648,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -724,7 +724,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -877,7 +877,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..cfead7ded2 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -347,8 +347,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e2032afcb7..0696dfe7eb 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -219,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1161520f76..7413a29eda 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -636,7 +636,7 @@ ExecInitUpdateProjection(ModifyTableState *mtstate,
Assert(whichrel >= 0 && whichrel < mtstate->mt_nrels);
}
- updateColnos = (List *) list_nth(node->updateColnosLists, whichrel);
+ updateColnos = (List *) list_nth(mtstate->mt_updateColnosLists, whichrel);
/*
* For UPDATE, we use the old tuple to fill up missing values in the tuple
@@ -4282,7 +4282,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ModifyTableState *mtstate;
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
- int nrels = list_length(node->resultRelations);
+ int nrels;
+ List *resultRelations = NIL;
+ List *withCheckOptionLists = NIL;
+ List *returningLists = NIL;
+ List *updateColnosLists = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4292,6 +4296,45 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+ /*
+ * Only consider unpruned relations for initializing their ResultRelInfo
+ * struct and other fields such as withCheckOptions, etc.
+ */
+ i = 0;
+ foreach(l, node->resultRelations)
+ {
+ Index rti = lfirst_int(l);
+
+ if (bms_is_member(rti, estate->es_unpruned_relids))
+ {
+ resultRelations = lappend_int(resultRelations, rti);
+ if (node->withCheckOptionLists)
+ {
+ List *withCheckOptions = list_nth_node(List,
+ node->withCheckOptionLists,
+ i);
+
+ withCheckOptionLists = lappend(withCheckOptionLists, withCheckOptions);
+ }
+ if (node->returningLists)
+ {
+ List *returningList = list_nth_node(List,
+ node->returningLists,
+ i);
+
+ returningLists = lappend(returningLists, returningList);
+ }
+ if (node->updateColnosLists)
+ {
+ List *updateColnosList = list_nth(node->updateColnosLists, i);
+
+ updateColnosLists = lappend(updateColnosLists, updateColnosList);
+ }
+ }
+ i++;
+ }
+ nrels = list_length(resultRelations);
+
/*
* create state structure
*/
@@ -4312,6 +4355,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->mt_merge_inserted = 0;
mtstate->mt_merge_updated = 0;
mtstate->mt_merge_deleted = 0;
+ mtstate->mt_updateColnosLists = updateColnosLists;
/*----------
* Resolve the target relation. This is the same as:
@@ -4329,6 +4373,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
if (node->rootRelation > 0)
{
+ Assert(bms_is_member(node->rootRelation, estate->es_unpruned_relids));
mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
node->rootRelation);
@@ -4343,7 +4388,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
- node->epqParam, node->resultRelations);
+ node->epqParam, resultRelations);
mtstate->fireBSTriggers = true;
/*
@@ -4361,7 +4406,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
resultRelInfo = mtstate->resultRelInfo;
i = 0;
- foreach(l, node->resultRelations)
+ foreach(l, resultRelations)
{
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
@@ -4505,7 +4550,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize any WITH CHECK OPTION constraints if needed.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->withCheckOptionLists)
+ foreach(l, withCheckOptionLists)
{
List *wcoList = (List *) lfirst(l);
List *wcoExprs = NIL;
@@ -4528,7 +4573,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/*
* Initialize RETURNING projections if needed.
*/
- if (node->returningLists)
+ if (returningLists)
{
TupleTableSlot *slot;
ExprContext *econtext;
@@ -4537,7 +4582,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize result tuple slot and assign its rowtype using the first
* RETURNING list. We assume the rest will look the same.
*/
- mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
+ mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
/* Set up a slot for the output of the RETURNING projection(s) */
ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
@@ -4552,7 +4597,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->returningLists)
+ foreach(l, returningLists)
{
List *rlist = (List *) lfirst(l);
@@ -4653,8 +4698,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* Find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 2fb2e73604..a7f9824e4d 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1685,7 +1686,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2500,6 +2502,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int query_index = 0;
spicallbackarg.query = plansource->query_string;
@@ -2690,14 +2693,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, query_index);
FreeQueryDesc(qdesc);
}
else
@@ -2794,6 +2799,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ query_index++;
}
/* Done with this plan, so release refcount */
@@ -2871,7 +2878,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2927,7 +2935,10 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ if (queryDesc->cplan)
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ else
+ ExecutorStart(queryDesc, eflags);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9c253e864a..5fe2eeb65c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,6 +559,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(glob->allRelids,
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9f13243d54..053d2687f2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -564,7 +564,8 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
/*
* If it's a plain relation RTE (or a subquery that was once a view
- * reference), add the relation OID to relationOids.
+ * reference), add the relation OID to relationOids. Also add its new RT
+ * index to the set of relations that need to be locked for execution.
*
* We do this even though the RTE might be unreferenced in the plan tree;
* this would correspond to cases such as views that were expanded, child
@@ -576,7 +577,11 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
*/
if (newrte->rtekind == RTE_RELATION ||
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
+ {
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ glob->allRelids = bms_add_member(glob->allRelids,
+ list_length(glob->finalrtable));
+ }
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
@@ -1740,6 +1745,11 @@ set_customscan_references(PlannerInfo *root,
*
* Also update the RT indexes present in PartitionedRelPruneInfos to add the
* offset.
+ *
+ * Finally, if there are initial pruning steps, add the RT indexes of the
+ * leaf partitions to the set of relations that are prunable at execution
+ * startup time. This set indicates which relations should not be locked
+ * before executor startup, as they may be pruned during initial pruning.
*/
static int
register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
@@ -1762,8 +1772,25 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ int i;
prelinfo->rtindex += rtoffset;
+
+ for (i = 0; i < prelinfo->nparts; i++)
+ {
+ /*
+ * Non-leaf partitions and partitions that do not have a
+ * subplan are not included in this map as mentioned in
+ * make_partitionedrel_pruneinfo().
+ */
+ if (prelinfo->leafpart_rti_map[i])
+ {
+ prelinfo->leafpart_rti_map[i] += rtoffset;
+ if (prelinfo->initial_pruning_steps)
+ glob->prunableRelids = bms_add_member(glob->prunableRelids,
+ prelinfo->leafpart_rti_map[i]);
+ }
+ }
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index ae1d69f96c..03e596c405 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -645,6 +645,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ int *leafpart_rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -657,6 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ leafpart_rti_map = (int *) palloc0(nparts * sizeof(int));
present_parts = NULL;
i = -1;
@@ -671,9 +673,28 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of "leaf" partitions so they can be
+ * included in the PlannerGlobal.prunableRelids set, indicating
+ * relations whose locking is deferred until executor startup.
+ *
+ * We don’t defer locking of sub-partitioned partitions because
+ * setting up PartitionedRelPruningData currently occurs before
+ * initial pruning, so the relation must be locked at that stage,
+ * even if it may be pruned.
+ *
+ * Only leaf partitions with a valid subplan that are prunable
+ * using initial pruning are added to prunableRelids. So
+ * partitions without a subplan due to constraint exclusion will
+ * remain in PlannedStmt.unprunableRelids and thus their locking
+ * will not be deferred even if they may ultimately be pruned due
+ * to initial pruning.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ leafpart_rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -695,6 +716,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->leafpart_rti_map = leafpart_rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 46d3ad566f..50ac03c0df 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -668,7 +668,8 @@ create_edata_for_relation(LogicalRepRelMapEntry *rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5e23453f07..3c0502e069 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -811,7 +811,8 @@ create_estate_for_relation(Relation rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
estate->es_output_cid = GetCurrentCommandId(false);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 42af768045..811e0a02df 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1236,6 +1236,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2038,7 +2039,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 0c45fcf318..fe52db1369 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -36,6 +37,9 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +69,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +82,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +128,9 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +143,9 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,14 +157,17 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ if (queryDesc->cplan)
+ ExecutorStartCachedPlan(queryDesc, 0, plansource, query_index);
+ else
+ ExecutorStart(queryDesc, 0);
/*
* Run the plan to completion.
@@ -493,6 +508,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -512,9 +528,13 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (portal->cplan)
+ ExecutorStartCachedPlan(queryDesc, myeflags,
+ portal->plansource, 0);
+ else
+ ExecutorStart(queryDesc, myeflags);
/*
* This tells PortalCleanup to shut down the executor
@@ -1194,6 +1214,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int query_index = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1275,6 +1296,9 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1284,6 +1308,9 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1348,6 +1375,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ query_index++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index c66a088f40..8908a0cdc2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,7 +101,8 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ bool release_generic);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -578,10 +579,17 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
+ *
+ * This also releases and drops the generic plan (plansource->gplan), if any,
+ * as most callers will typically build a new CachedPlan for the plansource
+ * right after this. However, when called from UpdateCachedPlan(), the
+ * function does not release the generic plan, as UpdateCachedPlan() updates
+ * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv)
+ QueryEnvironment *queryEnv,
+ bool release_generic)
{
bool snapshot_set;
RawStmt *rawtree;
@@ -678,8 +686,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference if any */
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference, if any, and if requested */
+ if (release_generic)
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -815,8 +824,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. However, the plans are not fully
+ * race-condition-free until the executor acquires locks on the prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -870,7 +882,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
- /* Successfully revalidated and locked the query. */
+ /*
+ * Successfully revalidated and locked the query. Set is_reused
+ * to true so that CachedPlanRequiresLocking() returns true.
+ */
+ plan->is_reused = true;
return true;
}
@@ -895,12 +911,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* To build a generic, parameter-value-independent plan, pass NULL for
* boundParams. To build a custom plan, pass the actual parameter values via
* boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * each parameter value; otherwise the planner will treat the value as a hint
+ * rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -911,6 +929,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
+ MemoryContext stmt_context = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -928,7 +947,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/*
* If we don't already have a copy of the querytree list that can be
@@ -967,10 +986,19 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
PopActiveSnapshot();
/*
- * Normally we make a dedicated memory context for the CachedPlan and its
- * subsidiary data. (It's probably not going to be large, but just in
- * case, allow it to grow large. It's transient for the moment.) But for
- * a one-shot plan, we just leave it in the caller's memory context.
+ * Normally, we create a dedicated memory context for the CachedPlan and
+ * its subsidiary data. Although it's usually not very large, the context
+ * is designed to allow growth if necessary.
+ *
+ * The PlannedStmts are stored in a separate child context (stmt_context)
+ * of the CachedPlan's memory context. This separation allows
+ * UpdateCachedPlan() to free and replace the PlannedStmts without
+ * affecting the CachedPlan structure or its stmt_list List.
+ *
+ * For one-shot plans, we instead use the caller's memory context, as the
+ * CachedPlan will not persist. stmt_context will be set to NULL in this
+ * case, because UpdateCachedPlan() should never get called on a one-shot
+ * plan.
*/
if (!plansource->is_oneshot)
{
@@ -979,12 +1007,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- /*
- * Copy plan into the new context.
- */
- MemoryContextSwitchTo(plan_context);
+ stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan PlannedStmts",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
+ MemoryContextSetParent(stmt_context, plan_context);
+ MemoryContextSwitchTo(stmt_context);
plist = copyObject(plist);
+
+ MemoryContextSwitchTo(plan_context);
+ plist = list_copy(plist);
}
else
plan_context = CurrentMemoryContext;
@@ -1025,8 +1058,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->saved_xmin = InvalidTransactionId;
plan->refcount = 0;
plan->context = plan_context;
+ plan->stmt_context = stmt_context;
plan->is_oneshot = plansource->is_oneshot;
plan->is_saved = false;
+ plan->is_reused = false;
plan->is_valid = true;
/* assign generation number to new plan */
@@ -1153,8 +1188,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid, but not all locks are acquired if the
+ * returned plan is a reused generic plan. In such cases, locks on relations
+ * subject to initial runtime pruning are not taken by CheckCachedPlan() but
+ * deferred until the execution startup phase, specifically when
+ * ExecDoInitialPruning() performs initial pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1180,7 +1218,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1276,6 +1314,113 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * UpdateCachedPlan
+ * Create fresh plans for all queries in the CachedPlanSource, replacing
+ * those in the generic plan's stmt_list, and return the plan for the
+ * query_index'th query.
+ *
+ * This function is primarily used by ExecutorStartCachedPlan() to handle
+ * cases where the original generic CachedPlan becomes invalid. Such
+ * invalidation may occur when prunable relations in the old plan for the
+ * query_index'th query are locked in preparation for execution.
+ *
+ * Note that invalidations received during the execution of the query_index'th
+ * query can affect both the queries that have already finished execution
+ * (e.g., due to concurrent modifications on prunable relations that were not
+ * locked during their execution) and also the queries that have not yet been
+ * executed. As a result, this function updates all plans to ensure
+ * CachedPlan.is_valid is safely set to true.
+ *
+ * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
+ * the caller and any of its callers must not rely on them remaining accessible
+ * after this function is called.
+ */
+PlannedStmt *
+UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ ListCell *l1,
+ *l2;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
+ else if (plan->is_oneshot)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan() returned
+ * the CachedPlan. See the comment in BuildCachedPlan() for details on why
+ * this might happen. Although invalidation is likely a false positive as
+ * stated there, we make the plan valid to ensure the query list used for
+ * planning is up to date.
+ *
+ * The risk of catching an invalidation is higher here than when
+ * BuildCachedPlan() is called from GetCachedPlan(), because this function
+ * is normally called long after GetCachedPlan() returns the CachedPlan, so
+ * much more processing could have occurred including things that mark
+ * the CachedPlanSource invalid.
+ *
+ * Note: Do not release plansource->gplan, because the upstream callers
+ * (such as the callers of ExecutorStartCachedPlan()) would still be
+ * referencing it.
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv, false);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for all the queries after making a copy to be
+ * scribbled on by the planner.
+ */
+ query_list = copyObject(query_list);
+
+ /*
+ * Planning work is done in the caller's memory context. The resulting
+ * PlannedStmt is then copied into plan->stmt_context after throwing
+ * away the old ones.
+ */
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+ Assert(list_length(plan_list) == list_length(plan->stmt_list));
+
+ MemoryContextReset(plan->stmt_context);
+ oldcxt = MemoryContextSwitchTo(plan->stmt_context);
+ forboth (l1, plan_list, l2, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst(l1);
+
+ lfirst(l2) = copyObject(plannedstmt);
+ }
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * XXX Should this also (re)set the properties of the CachedPlan that are
+ * set in BuildCachedPlan() after creating the fresh plans such as
+ * planRoleId, dependsOnRole, and save_xmin?
+ */
+
+ /*
+ * We've updated all the plans that might have been invalidated, so mark
+ * the CachedPlan as valid.
+ */
+ plan->is_valid = true;
+
+ /* Also update generic_cost because we just created a new generic plan. */
+ plansource->generic_cost = cached_plan_cost(plan, false);
+
+ return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1654,7 +1799,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv);
+ RevalidateCachedQuery(plansource, queryEnv, true);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -1776,7 +1921,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,13 +1939,16 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 93137820ac..ef4791bf65 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index aa5872bc15..09c1b1367a 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,10 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 0b34784922..a0843481f7 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -49,6 +49,8 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * leafpart_rti_map RT index by partition index, or 0 if not a leaf
+ * partition.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -69,6 +71,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ int *leafpart_rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -140,6 +143,7 @@ extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..02584dd154 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +265,30 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
+
+/*
+ * Locks are needed only if running a cached plan that might contain unlocked
+ * relations, such as a reused generic plan.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -590,7 +618,8 @@ extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
-extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
+extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids);
extern void ExecCloseRangeTableRelations(EState *estate);
extern void ExecCloseResultRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f93061c7bf..9643a9d626 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -639,9 +640,14 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
+ Bitmapset *es_unpruned_relids; /* PlannedStmt.unprunableRelids + RT
+ * indexes of leaf partitions that
+ * survive initial pruning; see
+ * ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -687,6 +693,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
@@ -1426,6 +1433,12 @@ typedef struct ModifyTableState
double mt_merge_inserted;
double mt_merge_updated;
double mt_merge_deleted;
+
+ /*
+ * List of valid updateColnosLists. Contains only those belonging to
+ * unpruned relations from ModifyTable.updateColnosLists.
+ */
+ List *mt_updateColnosLists;
} ModifyTableState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index f8a4cd42c6..ef6156f30b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,14 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of all relation RTEs in finalrtable (RTE_RELATION and
+ * RTE_SUBQUERY RTEs of views) and of those that are subject to runtime
+ * pruning at plan initialization time ("initial" pruning).
+ */
+ Bitmapset *allRelids;
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index ef89927471..59699a1f86 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; set for
+ * AcquireExecutorLocks(). */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1476,6 +1480,9 @@ typedef struct PartitionedRelPruneInfo
/* subpart index by partition index, or -1 */
int *subpart_map pg_node_attr(array_size(nparts));
+ /* RT index by partition index, or 0 if not a leaf partition */
+ int *leafpart_rti_map pg_node_attr(array_size(nparts));
+
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..72862f5e85 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,8 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
+#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -139,10 +141,11 @@ typedef struct CachedPlanSource
* The reference count includes both the link from the parent CachedPlanSource
* (if any), and any active plan executions, so the plan can be discarded
* exactly when refcount goes to zero. Both the struct itself and the
- * subsidiary data live in the context denoted by the context field.
- * This makes it easy to free a no-longer-needed cached plan. (However,
- * if is_oneshot is true, the context does not belong solely to the CachedPlan
- * so no freeing is possible.)
+ * subsidiary data, except the PlannedStmts in stmt_list live in the context
+ * denoted by the context field; the PlannedStmts live in the context denoted
+ * by stmt_context. Separate contexts makes it easy to free a no-longer-needed
+ * cached plan. (However, if is_oneshot is true, the context does not belong
+ * solely to the CachedPlan so no freeing is possible.)
*/
typedef struct CachedPlan
{
@@ -150,6 +153,7 @@ typedef struct CachedPlan
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
+ bool is_reused; /* is it a reused generic plan? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
@@ -158,6 +162,10 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext stmt_context; /* context containing the PlannedStmts in
+ * stmt_list, but not the List itself which
+ * is in the above context; NULL if is_oneshot
+ * is true. */
} CachedPlan;
/*
@@ -223,6 +231,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -235,4 +247,34 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire additional locks?
+ *
+ * If the plan is a saved generic plan, the executor must acquire locks for
+ * relations that are not covered by AcquireExecutorLocks(), such as partitions
+ * that are subject to initial runtime pruning.
+ *
+ * Note: These locks are unnecessary if the plan is executed immediately after
+ * its creation, since the planner would have already acquired them. However,
+ * we do not optimize for that case.
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return !cplan->is_oneshot && cplan->is_reused;
+}
+
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index fa4693a3f5..44aa828fdf 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/fmgrprotos.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..5bfb2b33b3
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,282 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(56 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(41 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Index Scan using foo12_1_a on foo12_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(9 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Disabled: true
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(10 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..f27e8fb521
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,80 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE TABLE bar (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE bar1 PARTITION OF bar FOR VALUES IN (1);
+ CREATE INDEX ON bar1(a);
+ CREATE TABLE bar2 PARTITION OF bar FOR VALUES IN (2);
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO UPDATE bar SET a = a WHERE a = one();
+ CREATE RULE update_bar AS ON UPDATE TO bar DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo, bar;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Another case with Append with run-time pruning in a subquery
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 7a03b4e360..705cd922fc 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4440,3 +4440,47 @@ drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (1, 'd');
+ a | b | c
+---+---+---
+ 1 | d | t
+(1 row)
+
+explain (costs off) execute update_part_abc_view (2, 'a');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_2 part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_2 part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (2, 'a');
+ERROR: new row violates check option for view "part_abc_view"
+DETAIL: Failing row contains (2, a, t).
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 442428d937..af26ad2fb2 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1339,3 +1339,21 @@ drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+execute update_part_abc_view (1, 'd');
+explain (costs off) execute update_part_abc_view (2, 'a');
+execute update_part_abc_view (2, 'a');
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-01-23 07:15 ` Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-01-23 07:15 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Thu, Dec 12, 2024 at 4:58 PM Amit Langote <[email protected]> wrote:
> To summarize how extensions can be affected:
>
> 1. Plan invalidation during standard_ExecutorStart(): A plan tree
> originating from a CachedPlan can become invalid during
> standard_ExecutorStart() due to locks taken on leaf partitions that
> survive initial pruning. Extensions should be updated to handle this
> scenario by checking ExecPlanStillValid(estate) immediately after
> calling standard_ExecutorStart() in their ExecutorStart_hook. If it
> returns false, the extensions should avoid further processing.
>
> 2. Validation of RT indexes: If the plan tree remains valid, any
> direct manipulation of relations using RT indexes must first verify
> that the RT index is present in the EState.es_unpruned_relids
> bitmapset. This bitmapset includes: a) RT indexes of relations that
> are originally unprunable (and locked during GetCachedPlan()), and
> b) RT indexes of leaf partitions that survive initial partition
> pruning. This step is crucial because pruned relations are not locked.
> Additionally, with the update in 0004, attempting to open pruned
> relations using ExecGetRangeTableRelation() will result in an error.
>
> I’d love to hear from anyone maintaining executor hooks, such as those
> from Timescale, Citus, or other extension developers. Please give this
> patch set (0001-0004) a try and let me know if you run into any issues
> or have feedback. 0005 is a sketch of an approach that eliminates the
> need for extensions to check ExecPlanStillValid() in their
> ExecutorStart_hook.
I’ve rebased over recent changes to setrefs.c (commit bf826ea0629).
During the rebase, I realized that the patch
0002-Initialize-PartitionPruneContexts-lazily wasn’t a good idea after
all.
The test case added by bf826ea0629 highlighted an issue: initializing
pruning expressions lazily during execution could leave the
Append/MergeAppend node’s PlanState.subPlan uninitialized at
ExecInitNode() time. Initially, I thought this would have only
cosmetic consequences -- such as changes in test case output where
SubPlans referenced in "exec" pruning expressions wouldn’t appear --
but I may have underestimated the problem. As a result, I’ve abandoned
that approach and the patch in favor of initializing all pruning
expressions during plan initialization.
Additionally, I revisited the impact of the main patch on
ExecutorStart_hooks. It seems better to change the return type from
void to bool, returning the result of
ExecPlanStillValid(queryDesc->estate). This change has the added
benefit of breaking extensions that use ExecutorStart_hook at compile
time, encouraging authors to update their code. The updated commit
message includes details on additional checks extensions must
implement, particularly for cases where they might access pruned and
thus unlocked relations.
I've stared at the refactoring patches 0001 and 0002 for long enough
at this point that I'd like to commit them early next week, barring
further comments or objections. I'll keep staring at 0003.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v60-0002-Perform-runtime-initial-pruning-outside-ExecInit.patch (30.2K, 2-v60-0002-Perform-runtime-initial-pruning-outside-ExecInit.patch)
download | inline diff:
From c1f8aed91ee1a43871b09d9ba30a6e210c89cb28 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 23 Jan 2025 14:49:48 +0900
Subject: [PATCH v60 2/3] Perform runtime initial pruning outside
ExecInitNode()
This commit builds on the prior change that moved PartitionPruneInfos
out of individual plan nodes into a list in PlannedStmt. It moves the
initialization of PartitionPruneStates and runtime initial pruning
from ExecInitNode() to a new routine, ExecDoInitialPruning().
ExecDoInitialPruning() is called by InitPlan() before calling
ExecInitNode() on the main plan tree and subplans. It performs the
initial pruning and saves the result -- a Bitmapset of indexes for
surviving child subnodes -- in es_part_prune_results, a list in
EState.
PartitionPruneStates created for initial pruning are stored in
es_part_prune_states, another list in EState, for later use during
exec pruning. Both lists are parallel to es_part_prune_infos, which
holds the PartitionPruneInfos from PlannedStmt, enabling shared
indexing.
PartitionPruneStates initialized in ExecDoInitialPruning() now include
only the PartitionPruneContexts for initial pruning steps. Exec
pruning contexts are initialized later in
ExecInitPartitionExecPruning() when the parent plan node is
initialized, as the exec pruning step expressions depend on the
parent node's PlanState. The existing function
PartitionPruneFixSubPlanMap() has been repurposed for this
initialization to avoid duplicating a similar loop structure for
finding PartitionedRelPruningData to initialize exec contexts for. It
has been renamed to InitExecPruningContexts() to reflect its new
primary responsibility. The original logic to "fix subplan maps"
remains intact but is now encapsulated within the renamed function.
To ensure exec pruning contexts are not accessed before initialization,
a new boolean field, 'initialized', is added to PartitionPruneContext.
This commit removes two obsolete Asserts in partkey_datum_from_expr().
The ExprContext used for pruning expression evaluation is now
independent of the parent PlanState, making these Asserts unnecessary.
Reviewed-by: Robert Haas
Reviewed-by: Tom Lane
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 12 +
src/backend/executor/execPartition.c | 318 ++++++++++++++++++-------
src/backend/executor/nodeAppend.c | 10 +-
src/backend/executor/nodeMergeAppend.c | 10 +-
src/backend/partitioning/partprune.c | 7 +-
src/include/executor/execPartition.h | 18 +-
src/include/nodes/execnodes.h | 2 +
src/include/partitioning/partprune.h | 2 +
8 files changed, 266 insertions(+), 113 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1d27b840ca9..604cb0625b8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -46,6 +46,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
@@ -855,6 +856,17 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ /*
+ * Perform runtime "initial" pruning to identify which child subplans,
+ * corresponding to the children of plan nodes that contain
+ * PartitionPruneInfo such as Append, will not be executed. The results,
+ * which are bitmapsets of indexes of the child subplans that will be
+ * executed, are saved in es_part_prune_results. These results correspond
+ * to each PartitionPruneInfo entry, and the es_part_prune_results list is
+ * parallel to es_part_prune_infos.
+ */
+ ExecDoInitialPruning(estate);
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 16aec59d0ec..70f11913ad4 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,7 +181,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
+static PartitionPruneState *CreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
@@ -189,9 +189,10 @@ static void InitPartitionPruneContext(PartitionPruneContext *context,
PartitionKey partkey,
PlanState *planstate,
ExprContext *econtext);
-static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
- Bitmapset *initially_valid_subplans,
- int n_total_subplans);
+static void InitExecPartitionPruneContexts(PartitionPruneState *prunstate,
+ PlanState *parent_plan,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1762,48 +1763,106 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
- * ExecInitPartitionPruning:
- * Creates the PartitionPruneState required by ExecFindMatchingSubPlans.
- * Details stored include how to map the partition index returned by the
- * partition pruning code into subplan indexes. Also determines the set
- * of subplans to initialize considering the result of performing initial
- * pruning steps if any. Maps in PartitionPruneState are updated to
+ * ExecDoInitialPruning:
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode() for
+ * all plan nodes that contain a PartitionPruneInfo.
+ *
+ * ExecInitPartitionExecPruning:
+ * Updates the PartitionPruneState found at given part_prune_index in
+ * EState.es_part_prune_states for use during "exec" pruning if required.
+ * Also returns the set of subplans to initialize that would be stored at
+ * part_prune_index in EState.es_part_prune_result by
+ * ExecDoInitialPruning(). Maps in PartitionPruneState are updated to
* account for initial pruning possibly having eliminated some of the
* subplans.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
- * called during ExecInitPartitionPruning() to find the initially
- * matching subplans based on performing the initial pruning steps and
- * then must be called again each time the value of a Param listed in
+ * called during ExecDoInitialPruning() to find the initially matching
+ * subplans based on performing the initial pruning steps and then must be
+ * called again each time the value of a Param listed in
* PartitionPruneState's 'execparamids' changes.
*-------------------------------------------------------------------------
*/
/*
- * ExecInitPartitionPruning
- * Initialize data structure needed for run-time partition pruning and
- * do initial pruning if needed
+ * ExecDoInitialPruning
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode() for
+ * plan nodes that support partition pruning.
+ *
+ * This function iterates over each PartitionPruneInfo entry in
+ * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
+ * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * these states through their corresponding indexes in es_part_prune_states and
+ * assign each state to the parent node's PlanState, from where it will be used
+ * for "exec" pruning.
+ *
+ * If initial pruning steps exist for a PartitionPruneInfo entry, this function
+ * executes those pruning steps and stores the result as a bitmapset of valid
+ * child subplans, identifying which subplans should be initialized for
+ * execution. The results are saved in estate->es_part_prune_results.
+ *
+ * If no initial pruning is performed for a given PartitionPruneInfo, a NULL
+ * entry is still added to es_part_prune_results to maintain alignment with
+ * es_part_prune_infos. This ensures that ExecInitPartitionExecPruning() can
+ * use the same index to retrieve the pruning results.
+ */
+void
+ExecDoInitialPruning(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans = NULL;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+
+ /*
+ * Perform initial pruning steps, if any, and save the result
+ * bitmapset or NULL as described in the header comment.
+ */
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ validsubplans);
+ }
+}
+
+/*
+ * ExecInitPartitionExecPruning
+ * Initialize the data structures needed for runtime "exec" partition
+ * pruning and return the result of initial pruning, if available.
*
* 'relids' identifies the relation to which both the parent plan and the
* PartitionPruneInfo given by 'part_prune_index' belong.
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning would have been performed by ExecDoInitialPruning(), if
+ * necessary, and the bitmapset of surviving subplans' indexes would have
+ * been stored as the part_prune_index'th element of
+ * EState.es_part_prune_results.
*
- * If subplans are indeed pruned, subplan_map arrays contained in the returned
- * PartitionPruneState are re-sequenced to not count those, though only if the
- * maps will be needed for subsequent execution pruning passes.
+ * If subplans were indeed pruned during initial pruning, the subplan_map
+ * arrays in the returned PartitionPruneState are re-sequenced to exclude those
+ * subplans, but only if the maps will be needed for subsequent execution
+ * pruning passes.
*/
PartitionPruneState *
-ExecInitPartitionPruning(PlanState *planstate,
- int n_total_subplans,
- int part_prune_index,
- Bitmapset *relids,
- Bitmapset **initially_valid_subplans)
+ExecInitPartitionExecPruning(PlanState *planstate,
+ int n_total_subplans,
+ int part_prune_index,
+ Bitmapset *relids,
+ Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
@@ -1819,17 +1878,19 @@ ExecInitPartitionPruning(PlanState *planstate,
bmsToString(pruneinfo->relids), part_prune_index,
bmsToString(relids));
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
-
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
-
/*
- * Perform an initial partition prune pass, if required.
+ * The PartitionPruneState would have been created by
+ * ExecDoInitialPruning() and stored as the part_prune_index'th element of
+ * EState.es_part_prune_states.
*/
+ prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
+ Assert(prunestate != NULL);
+
+ /* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = list_nth_node(Bitmapset,
+ estate->es_part_prune_results,
+ part_prune_index);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1839,22 +1900,21 @@ ExecInitPartitionPruning(PlanState *planstate,
}
/*
- * Re-sequence subplan indexes contained in prunestate to account for any
- * that were removed above due to initial pruning. No need to do this if
- * no steps were removed.
+ * The exec pruning state must also be initialized, if needed, before it
+ * can be used for pruning during execution.
+ *
+ * This also re-sequences subplan indexes contained in prunestate to
+ * account for any that were removed due to initial pruning; refer to the
+ * condition in InitExecPartitionPruneContexts() that is used to determine
+ * whether to do this. If no exec pruning needs to be done, we would thus
+ * leave the maps to be in an invalid invalid state, but that's ok since
+ * that data won't be consulted again (cf initial Assert in
+ * ExecFindMatchingSubPlans).
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
- {
- /*
- * We can safely skip this when !do_exec_prune, even though that
- * leaves invalid data in prunestate, because that data won't be
- * consulted again (cf initial Assert in ExecFindMatchingSubPlans).
- */
- if (prunestate->do_exec_prune)
- PartitionPruneFixSubPlanMap(prunestate,
- *initially_valid_subplans,
- n_total_subplans);
- }
+ if (prunestate->do_exec_prune)
+ InitExecPartitionPruneContexts(prunestate, planstate,
+ *initially_valid_subplans,
+ n_total_subplans);
return prunestate;
}
@@ -1863,7 +1923,11 @@ ExecInitPartitionPruning(PlanState *planstate,
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * This includes PartitionPruneContexts (stored in each
+ * PartitionedRelPruningData corresponding to a PartitionedRelPruneInfo),
+ * which hold the ExprStates needed to evaluate pruning expressions, and
+ * mapping arrays to convert partition indexes from the pruning logic
+ * into subplan indexes in the parent plan node's list of child subplans.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1875,16 +1939,25 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Note that only the PartitionPruneContexts for initial pruning are
+ * initialized here. Those required for exec pruning are initialized later in
+ * ExecInitPartitionExecPruning(), as they depend on the availability of the
+ * parent plan node's PlanState.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+
+ /*
+ * Expression context that will be used by partkey_datum_from_expr() to
+ * evaluate expressions for comparison against partition bounds.
+ */
+ ExprContext *econtext = CreateExprContext(estate);
/* For data reading, executor always includes detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1901,6 +1974,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
palloc(offsetof(PartitionPruneState, partprunedata) +
sizeof(PartitionPruningData *) * n_part_hierarchies);
+ /* Save ExprContext for use during InitExecPartitionPruneContexts(). */
+ prunestate->econtext = econtext;
prunestate->execparamids = NULL;
/* other_subplans can change at runtime, so we need our own copy */
prunestate->other_subplans = bms_copy(pruneinfo->other_subplans);
@@ -1950,6 +2025,10 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /* Remember for InitExecPartitionPruneContext(). */
+ pprune->partrel = partrel;
+
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
@@ -2061,29 +2140,31 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->present_parts = bms_copy(pinfo->present_parts);
/*
- * Initialize pruning contexts as needed. Note that we must skip
- * execution-time partition pruning in EXPLAIN (GENERIC_PLAN),
- * since parameter values may be missing.
+ * Only initial_context is initialized here. exec_context is
+ * initialized during ExecInitPartitionExecPruning() when the
+ * parent plan's PlanState is available.
+ *
+ * Note that we must skip execution-time (both "init" and "exec")
+ * partition pruning in EXPLAIN (GENERIC_PLAN), since parameter
+ * values may be missing.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
+ pprune->initial_context.initialized = false;
if (pinfo->initial_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
{
InitPartitionPruneContext(&pprune->initial_context,
- pinfo->initial_pruning_steps,
- partdesc, partkey, planstate,
+ pprune->initial_pruning_steps,
+ partdesc, partkey, NULL,
econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ pprune->exec_context.initialized = false;
if (pinfo->exec_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
{
- InitPartitionPruneContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -2118,6 +2199,9 @@ InitPartitionPruneContext(PartitionPruneContext *context,
int partnatts;
ListCell *lc;
+ /* Never call twice for a context. */
+ Assert(!context->initialized);
+
n_steps = list_length(pruning_steps);
context->strategy = partkey->strategy;
@@ -2185,13 +2269,22 @@ InitPartitionPruneContext(PartitionPruneContext *context,
}
}
}
+
+ context->initialized = true;
}
/*
- * PartitionPruneFixSubPlanMap
- * Fix mapping of partition indexes to subplan indexes contained in
- * prunestate by considering the new list of subplans that survived
- * initial pruning
+ * InitExecPartitionPruneContexts
+ * Initialize exec pruning contexts deferred by CreatePartitionPruneState()
+ *
+ * This function finalizes exec pruning setup for a PartitionPruneState by
+ * initializing contexts for pruning steps that require the parent plan's
+ * PlanState. It iterates over PartitionPruningData entries and sets up the
+ * necessary execution contexts for pruning during query execution.
+ *
+ * Also fix the mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived initial
+ * pruning.
*
* Current values of the indexes present in PartitionPruneState count all the
* subplans that would be present before initial pruning was done. If initial
@@ -2202,27 +2295,39 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* subplans in the post-initial-pruning set.
*/
static void
-PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
- Bitmapset *initially_valid_subplans,
- int n_total_subplans)
+InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
+ PlanState *parent_plan,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
{
- int *new_subplan_indexes;
+ EState *estate;
+ int *new_subplan_indexes = NULL;
Bitmapset *new_other_subplans;
int i;
int newidx;
+ bool fix_subplan_map = false;
- /*
- * First we must build a temporary array which maps old subplan indexes to
- * new ones. For convenience of initialization, we use 1-based indexes in
- * this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
+ Assert(prunestate->do_exec_prune);
+ Assert(parent_plan != NULL);
+ estate = parent_plan->state;
+
+ if (bms_num_members(initially_valid_subplans) < n_total_subplans)
{
- Assert(i < n_total_subplans);
- new_subplan_indexes[i] = newidx++;
+ fix_subplan_map = true;
+
+ /*
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
+ */
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
+ {
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
}
/*
@@ -2247,6 +2352,29 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
int nparts = pprune->nparts;
int k;
+ /* Initialize PartitionPruneContext for exec pruning, if needed. */
+ if (pprune->exec_pruning_steps != NIL)
+ {
+ PartitionKey partkey;
+ PartitionDesc partdesc;
+
+ /*
+ * See the comment in CreatePartitionPruneState() regarding
+ * the usage of partdesc and partkey.
+ */
+ partkey = RelationGetPartitionKey(pprune->partrel);
+ partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ pprune->partrel);
+
+ InitPartitionPruneContext(&pprune->exec_context,
+ pprune->exec_pruning_steps,
+ partdesc, partkey, parent_plan,
+ prunestate->econtext);
+ }
+
+ if (!fix_subplan_map)
+ continue;
+
/* We just rebuild present_parts from scratch */
bms_free(pprune->present_parts);
pprune->present_parts = NULL;
@@ -2288,19 +2416,22 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
}
/*
- * We must also recompute the other_subplans set, since indexes in it may
- * change.
+ * If we fixed subplan maps, we must also recompute the other_subplans
+ * set, since indexes in it may change.
*/
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
+ if (fix_subplan_map)
+ {
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- pfree(new_subplan_indexes);
+ pfree(new_subplan_indexes);
+ }
}
/*
@@ -2352,8 +2483,11 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
&result);
/* Expression eval may have used space in ExprContext too */
- if (pprune->exec_pruning_steps)
+ if (pprune->exec_context.initialized)
+ {
+ Assert(pprune->exec_pruning_steps);
ResetExprContext(pprune->exec_context.exprcontext);
+ }
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 986ef34030a..2397e5e17b0 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -144,11 +144,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans to initialize (validsubplans) by taking into account the
* result of performing initial pruning if any.
*/
- prunestate = ExecInitPartitionPruning(&appendstate->ps,
- list_length(node->appendplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
+ prunestate = ExecInitPartitionExecPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_index,
+ node->apprelids,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 1468f942388..b2dc6626c99 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -92,11 +92,11 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans to initialize (validsubplans) by taking into account the
* result of performing initial pruning if any.
*/
- prunestate = ExecInitPartitionPruning(&mergestate->ps,
- list_length(node->mergeplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
+ prunestate = ExecInitPartitionExecPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_index,
+ node->apprelids,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d3f60cc87c9..4693eef0c58 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -3783,13 +3783,8 @@ partkey_datum_from_expr(PartitionPruneContext *context,
/*
* We should never see a non-Const in a step unless the caller has
* passed a valid ExprContext.
- *
- * When context->planstate is valid, context->exprcontext is same as
- * context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL || context->exprcontext != NULL);
- Assert(context->planstate == NULL ||
- (context->exprcontext == context->planstate->ps_ExprContext));
+ Assert(context->exprcontext != NULL);
exprstate = context->exprstates[stateidx];
ectx = context->exprcontext;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 13177831d9f..855fed4fea5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -42,6 +42,9 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* PartitionedRelPruneInfo (see plannodes.h); though note that here,
* subpart_map contains indexes into PartitionPruningData.partrelprunedata[].
*
+ * partrel Partitioned table Relation; obtained by
+ * ExecGetRangeTableRelation(estate, rti), where
+ * rti is PartitionedRelPruneInfo.rtindex.
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
@@ -58,6 +61,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
*/
typedef struct PartitionedRelPruningData
{
+ Relation partrel;
int nparts;
int *subplan_map;
int *subpart_map;
@@ -90,6 +94,8 @@ typedef struct PartitionPruningData
* the clauses being unable to match to any tuple that the subplan could
* possibly produce.
*
+ * econtext Standalone ExprContext to evaluate expressions in
+ * the pruning steps
* execparamids Contains paramids of PARAM_EXEC Params found within
* any of the partprunedata structs. Pruning must be
* done again each time the value of one of these
@@ -112,6 +118,7 @@ typedef struct PartitionPruningData
*/
typedef struct PartitionPruneState
{
+ ExprContext *econtext;
Bitmapset *execparamids;
Bitmapset *other_subplans;
MemoryContext prune_context;
@@ -121,11 +128,12 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
-extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
- int n_total_subplans,
- int part_prune_index,
- Bitmapset *relids,
- Bitmapset **initially_valid_subplans);
+extern void ExecDoInitialPruning(EState *estate);
+extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
+ int n_total_subplans,
+ int part_prune_index,
+ Bitmapset *relids,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8ce4430af04..aca15f771a2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -656,6 +656,8 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
+ List *es_part_prune_states; /* List of PartitionPruneState */
+ List *es_part_prune_results; /* List of Bitmapset */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c413734789a..0ed39c89c3d 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -26,6 +26,7 @@ struct RelOptInfo;
* Stores information needed at runtime for pruning computations
* related to a single partitioned table.
*
+ * initialized Has the information in this struct been initialized?
* strategy Partition strategy, e.g. LIST, RANGE, HASH.
* partnatts Number of columns in the partition key.
* nparts Number of partitions in this partitioned table.
@@ -48,6 +49,7 @@ struct RelOptInfo;
*/
typedef struct PartitionPruneContext
{
+ bool initialized;
char strategy;
int partnatts;
int nparts;
--
2.43.0
[application/octet-stream] v60-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patch (21.1K, 3-v60-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patch)
download | inline diff:
From 12ee93694279433a49c740f7fbce5bcd25a4cdbd Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 23 Jan 2025 14:01:14 +0900
Subject: [PATCH v60 1/3] Move PartitionPruneInfo out of plan nodes into
PlannedStmt
This change moves PartitionPruneInfo from individual plan nodes to
PlannedStmt, allowing runtime initial pruning to span the entire plan
tree without needing to traverse it to find PartitionPruneInfos.
The PartitionPruneInfo pointer fields in Append and MergeAppend nodes
are replaced with an integer index pointing to a list of
PartitionPruneInfos in PlannedStmt, which now holds all
PartitionPruneInfos for the main query and its subqueries.
A bitmapset field is added to PartitionPruneInfo to store the RT
indexes corresponding to the apprelids field in Append or MergeAppend.
This ensures execution pruning logic verifies it operates on the
correct plan node.
Duplicated code in set_append_references() and
set_mergeappend_references() is refactored into a new function,
register_pruneinfo(). This function updates RT indexes by applying
rtoffset and adds PartitionPruneInfo to the global list in
PlannerGlobal.
Reviewed-by: Alvaro Herrera
Reviewed-by: Robert Haas
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 17 +++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/createplan.c | 23 +++---
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 103 ++++++++++++++----------
src/backend/partitioning/partprune.c | 19 +++--
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 16 +++-
src/include/partitioning/partprune.h | 8 +-
15 files changed, 137 insertions(+), 73 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index fb8dba3ab2c..1d27b840ca9 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -853,6 +853,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ff4d9dd1bb3..9c313d81315 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -181,6 +181,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7e71d422a62..16aec59d0ec 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1786,6 +1786,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'relids' identifies the relation to which both the parent plan and the
+ * PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1798,11 +1801,23 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo;
+
+ /* Obtain the pruneinfo we need. */
+ pruneinfo = list_nth_node(PartitionPruneInfo, estate->es_part_prune_infos,
+ part_prune_index);
+
+ /* Its relids better match the plan node's or the planner messed up. */
+ if (!bms_equal(relids, pruneinfo->relids))
+ elog(ERROR, "wrong pruneinfo with relids=%s found at part_prune_index=%d contained in plan node with relids=%s",
+ bmsToString(pruneinfo->relids), part_prune_index,
+ bmsToString(relids));
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 7c539de5cf2..6aac6f3a872 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -118,6 +118,7 @@ CreateExecutorState(void)
estate->es_rowmarks = NULL;
estate->es_rteperminfos = NIL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 0bd0e4e54d3..986ef34030a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -135,7 +135,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -146,7 +146,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e152c9ee3a0..1468f942388 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -83,7 +83,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -94,7 +94,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 1106cd85f0c..816a2b2a576 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1227,7 +1227,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1378,6 +1377,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1401,16 +1403,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1449,7 +1449,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1542,6 +1541,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1557,13 +1559,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6803edd0854..8a474a50be7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -555,6 +555,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1e7b7bc6ffc..0868249be94 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1731,6 +1731,53 @@ set_customscan_references(PlannerInfo *root,
cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
}
+/*
+ * register_partpruneinfo
+ * Subroutine for set_append_references and set_mergeappend_references
+ *
+ * Add the PartitionPruneInfo from root->partPruneInfos at the given index
+ * into PlannerGlobal->partPruneInfos and return its index there.
+ *
+ * Also update the RT indexes present in PartitionedRelPruneInfos to add the
+ * offset.
+ */
+static int
+register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
+{
+ PlannerGlobal *glob = root->glob;
+ PartitionPruneInfo *pinfo;
+ ListCell *l;
+
+ Assert(part_prune_index >= 0 &&
+ part_prune_index < list_length(root->partPruneInfos));
+ pinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+ part_prune_index);
+
+ pinfo->relids = offset_relid_set(pinfo->relids, rtoffset);
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ prelinfo->initial_pruning_steps =
+ fix_scan_list(root, prelinfo->initial_pruning_steps,
+ rtoffset, 1);
+ prelinfo->exec_pruning_steps =
+ fix_scan_list(root, prelinfo->exec_pruning_steps,
+ rtoffset, 1);
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pinfo);
+
+ return list_length(glob->partPruneInfos) - 1;
+}
+
/*
* set_append_references
* Do set_plan_references processing on an Append
@@ -1783,27 +1830,13 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- pinfo->initial_pruning_steps =
- fix_scan_list(root, pinfo->initial_pruning_steps,
- rtoffset, 1);
- pinfo->exec_pruning_steps =
- fix_scan_list(root, pinfo->exec_pruning_steps,
- rtoffset, 1);
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index =
+ register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1865,27 +1898,13 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- pinfo->initial_pruning_steps =
- fix_scan_list(root, pinfo->initial_pruning_steps,
- rtoffset, 1);
- pinfo->exec_pruning_steps =
- fix_scan_list(root, pinfo->exec_pruning_steps,
- rtoffset, 1);
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index =
+ register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index fa3c5b3c3bb..d3f60cc87c9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -207,16 +207,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -330,10 +334,11 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->relids = bms_copy(parentrel->relids);
pruneinfo->prune_infos = prunerelinfos;
/*
@@ -356,7 +361,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 265f836bcda..13177831d9f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,7 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d0f2dca5928..8ce4430af04 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -655,6 +655,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* List of PartitionPruneInfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 54ee17697e5..52d44f43021 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* "flat" list of PartitionPruneInfos */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -559,6 +562,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9e19cdd284d..07905d89b8a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in the
+ * plan */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
@@ -278,8 +281,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -313,8 +316,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1413,6 +1416,10 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * relids RelOptInfo.relids of the parent plan node (e.g. Append
+ * or MergeAppend) to which this PartitionPruneInfo node
+ * belongs. The pruning logic ensures that this matches
+ * the parent plan node's apprelids.
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1425,6 +1432,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal, no_query_jumble)
NodeTag type;
+ Bitmapset *relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 3aff23be21d..c413734789a 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
[application/octet-stream] v60-0003-Defer-locking-of-runtime-prunable-relations-in-c.patch (124.8K, 4-v60-0003-Defer-locking-of-runtime-prunable-relations-in-c.patch)
download | inline diff:
From 23950b3d9eb86317a40a6959808ede9bbe985cd5 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 23 Jan 2025 13:26:01 +0900
Subject: [PATCH v60 3/3] Defer locking of runtime-prunable relations in cached
plans
AcquireExecutorLocks() in plancache.c locks all relations in a plan's
range table to ensure the plan is safe for execution. However, this
locks runtime-prunable relations that will later be pruned during
"initial" runtime pruning, introducing unnecessary overhead. This
commit defers locking for such relations and ensures that any
invalidation caused by this deferral triggers replanning when needed.
This results in significant speedups for generic plans with many
runtime-prunable partitions.
* Locking changes:
The planner now tracks "unprunable" relations using the new
PlannedStmt.unprunableRelids field, computed during set_plan_refs()
by subtracting runtime-prunable RT indexes (from PartitionPruneInfos)
from all RT indexes. AcquireExecutorLocks() locks only these
unprunable relations.
During executor startup, ExecDoInitialPruning() identifies unpruned
partitions and locks them. A new es_unpruned_relids field in EState
tracks locked relations. It starts as PlannedStmt.unprunableRelids
and is updated by ExecDoInitialPruning() with RT indexes of unpruned
partitions. To support this, PartitionedRelPruneInfo and
PartitionedRelPruningData now include leafpart_rti_map[] to map
partition indexes to RT indexes.
Executor code working with child result relations and RowMarks is
adjusted to account for deferred locking. ExecInitModifyTable()
trims lists (resultRelations, withCheckOptionLists, returningLists,
updateColnosLists) to include only unpruned partitions, based on
es_unpruned_relids. ResultRelInfo structs are created only for these
unpruned partitions. Similarly, child RowMarks for pruned relations
are skipped, ensuring only unpruned relations are processed.
Trimming result relation lists in ExecInitModifyTable() avoids
unnecessary initialization of ResultRelInfos for pruned partitions.
This improves performance for updates and deletes on partitioned
tables with initial runtime pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks are properly locked. This
ensures unprunableRelids is accurate during planning.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and
handle invalidation caused by deferred locking. If invalidation
occurs, ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the refreshed
plan.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A
new CachedPlan.stmt_context, as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and statements list.
ExecutorStart() and ExecutorStart_hook now return a boolean value
indicating whether plan initialization succeeded with a valid
PlanState tree in QueryDesc.planstate.
* Testing:
The delay_execution module tests scenarios where cached plans become
invalid due to changes in prunable relations after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(). For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions inspecting RT indexes should check EState.es_unpruned_relids
to ensure the relation is locked. For example, see how InitPlan()
processes PlannedStmt.rowMarks.
Reviewed-by: Robert Haas
Reviewed-by: David Rowley
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 16 +-
.../pg_stat_statements/pg_stat_statements.c | 16 +-
src/backend/commands/copyfrom.c | 3 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 5 +-
src/backend/commands/explain.c | 22 +-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 5 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/commands/trigger.c | 14 +
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 147 ++++++++-
src/backend/executor/execParallel.c | 13 +-
src/backend/executor/execPartition.c | 120 +++++++-
src/backend/executor/execUtils.c | 20 +-
src/backend/executor/functions.c | 4 +-
src/backend/executor/nodeAppend.c | 8 +-
src/backend/executor/nodeLockRows.c | 9 +-
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/nodeModifyTable.c | 70 ++++-
src/backend/executor/spi.c | 29 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 29 +-
src/backend/partitioning/partprune.c | 22 ++
src/backend/replication/logical/worker.c | 3 +-
src/backend/replication/pgoutput/pgoutput.c | 3 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 51 +++-
src/backend/utils/cache/plancache.c | 204 +++++++++++--
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 6 +-
src/include/commands/trigger.h | 1 +
src/include/executor/execPartition.h | 6 +-
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 37 ++-
src/include/nodes/execnodes.h | 13 +
src/include/nodes/pathnodes.h | 8 +
src/include/nodes/plannodes.h | 7 +
src/include/utils/plancache.h | 50 +++-
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 ++++-
.../expected/cached-plan-inval.out | 282 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 80 +++++
src/test/regress/expected/partition_prune.out | 44 +++
src/test/regress/sql/partition_prune.sql | 18 ++
48 files changed, 1371 insertions(+), 137 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f1ad876e821..82c17c0a28a 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -76,7 +76,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -256,9 +256,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -294,9 +296,13 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
if (auto_explain_enabled())
{
@@ -314,6 +320,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb0..b735381cb0b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -332,7 +332,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -986,13 +986,19 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1015,6 +1021,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f5602..da1e8ddc5a1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -768,7 +768,8 @@ CopyFrom(CopyFromState cstate)
* index-entry-making machinery. (There used to be a huge amount of code
* here that basically duplicated execUtils.c ...)
*/
- ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos);
+ ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos,
+ bms_make_singleton(1));
resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, resultRelInfo, 1);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..091fbc12cc5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -556,7 +556,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -566,7 +566,8 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- ExecutorStart(cstate->queryDesc, 0);
+ if (!ExecutorStart(cstate->queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 23cecd99c9e..44b4665ccd3 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -332,12 +332,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ if (!ExecutorStart(queryDesc, GetIntoRelEFlags(into)))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e1..af25c16d215 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -519,7 +519,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -641,7 +642,9 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -697,7 +700,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -711,8 +714,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Prepare the plan for execution. */
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index ba540e3de5b..1b28d20412e 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -907,11 +907,13 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ if (!ExecutorStart(qdesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
ExecutorRun(qdesc, ForwardScanDirection, 0);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c12817091ed..0bfbc5ca6dc 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,12 +438,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index e7c8171c102..4c2ac045224 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -117,6 +117,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 8989c0c882d..c025b1f9f8c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -582,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int query_index = 0;
if (es->memory)
{
@@ -654,7 +656,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -665,6 +668,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ query_index++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index acf3e4a3f1f..75ea248c3fb 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5124,6 +5124,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be613..449c6068ae9 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+flag. If the plan tree is outdated (is_valid = false), the executor stops
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartCachedPlan
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartCachedPlan(), which will create a new plan
+tree and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 604cb0625b8..727d548881b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,11 +55,13 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -114,11 +116,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Return value indicates if the plan has been initialized successfully so
+ * that queryDesc->planstate contains a valid PlanState tree. It may not
+ * if the plan got invalidated during InitPlan().
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -130,12 +137,70 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
+ plan_valid = (*ExecutorStart_hook) (queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ return plan_valid;
}
+/*
+ * ExecutorStartCachedPlan
+ * Start execution for a given query in the CachedPlanSource, replanning
+ * if the plan is invalidated due to deferred locks taken during the
+ * plan's initialization
+ *
+ * This function handles cases where the CachedPlan given in queryDesc->cplan
+ * might become invalid during the initialization of the plan given in
+ * queryDesc->plannedstmt, particularly when prunable relations in it are
+ * locked after performing initial pruning. If the locks invalidate the plan,
+ * the function calls UpdateCachedPlan() to replan all queries in the
+ * CachedPlan, and then retries initialization.
+ *
+ * The function repeats the process until ExecutorStart() successfully
+ * initializes the plan, that is without the CachedPlan becoming invalid.
+ */
void
+ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (unlikely(queryDesc->cplan == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
+ if (unlikely(plansource == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlanSource");
+
+ /*
+ * Loop and retry with an updated plan until no further invalidation
+ * occurs.
+ */
+ while (1)
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /*
+ * Clean up the current execution state before creating the new
+ * plan to retry ExecutorStart(). Mark execution as aborted to
+ * ensure that AFTER trigger state is properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ /* Retry ExecutorStart() with an updated plan tree. */
+ queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+ }
+ else
+
+ /*
+ * Exit the loop if the plan is initialized successfully and no
+ * sinval messages were received that invalidated the CachedPlan.
+ */
+ break;
+ }
+}
+
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
EState *estate;
@@ -259,6 +324,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ return ExecPlanStillValid(queryDesc->estate);
}
/* ----------------------------------------------------------------
@@ -317,6 +384,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -423,8 +491,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -487,11 +558,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
(PgStat_Counter) estate->es_parallel_workers_launched);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -505,6 +575,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -603,6 +681,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -828,6 +921,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If the plan originates from a CachedPlan (given in queryDesc->cplan),
+ * it can become invalid during runtime "initial" pruning when the
+ * remaining set of locks is taken. The function returns early in that
+ * case without initializing the plan, and the caller is expected to
+ * retry with a new valid plan.
* ----------------------------------------------------------------
*/
static void
@@ -835,6 +934,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -851,9 +951,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
/*
* initialize the node's execution state
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
+ bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -864,9 +966,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* executed, are saved in es_part_prune_results. These results correspond
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
*/
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -881,8 +989,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Relation relation;
ExecRowMark *erm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* get relation's OID (will produce InvalidOid if subquery) */
@@ -2862,6 +2975,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2933,6 +3049,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
}
}
+ /*
+ * Copy es_unpruned_relids so that RowMarks of pruned relations are
+ * ignored in ExecInitLockRows() and ExecInitModifyTable() when
+ * initializing the plan trees below.
+ */
+ rcestate->es_unpruned_relids = parentestate->es_unpruned_relids;
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9c313d81315..1bedb808368 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
+ pstmt->unprunableRelids = estate->es_unpruned_relids;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -1257,8 +1258,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but parallel workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1439,7 +1447,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ if (!ExecutorStart(queryDesc, fpes->eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 70f11913ad4..0d28cc45f8c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -182,7 +183,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -196,7 +198,8 @@ static void InitExecPartitionPruneContexts(PartitionPruneState *prunstate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis);
/*
@@ -1766,7 +1769,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * all plan nodes that contain a PartitionPruneInfo.
+ * all plan nodes that contain a PartitionPruneInfo. This also locks the
+ * leaf partitions whose subnodes will be initialized if needed.
*
* ExecInitPartitionExecPruning:
* Updates the PartitionPruneState found at given part_prune_index in
@@ -1787,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * plan nodes that support partition pruning.
+ * plan nodes that support partition pruning. This also locks the leaf
+ * partitions whose subnodes will be initialized if needed.
*
* This function iterates over each PartitionPruneInfo entry in
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
@@ -1814,15 +1820,19 @@ void
ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
+ Bitmapset *all_leafpart_rtis = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo,
+ &all_leafpart_rtis);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
@@ -1831,10 +1841,45 @@ ExecDoInitialPruning(EState *estate)
* bitmapset or NULL as described in the header comment.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ else
+ validsubplan_rtis = all_leafpart_rtis;
+
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ locked_relids = lappend_int(locked_relids, rtindex);
+ }
+ }
+ estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
+ validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
+
+ /*
+ * Release the useless locks if the plan won't be executed. This is the
+ * same as what CheckCachedPlan() in plancache.c does.
+ */
+ if (!ExecPlanStillValid(estate))
+ {
+ foreach(lc, locked_relids)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
/*
@@ -1944,9 +1989,15 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* initialized here. Those required for exec pruning are initialized later in
* ExecInitPartitionExecPruning(), as they depend on the availability of the
* parent plan node's PlanState.
+ *
+ * On return, *all_leafpart_rtis will contain the RT indexes of all leaf
+ * partitions if initial pruning steps are skipped (e.g., during EXPLAIN
+ * (GENERIC_PLAN)). The caller is responsible for adding these RT indexes
+ * to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2039,8 +2090,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* The set of partitions that exist now might not be the same that
* existed when the plan was made. The normal case is that it is;
* optimize for that case with a quick comparison, and just copy
- * the subplan_map and make subpart_map point to the one in
- * PruneInfo.
+ * the subplan_map and make subpart_map, leafpart_rti_map point to
+ * the ones in PruneInfo.
*
* For the case where they aren't identical, we could have more
* partitions on either side; or even exactly the same number of
@@ -2059,6 +2110,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
sizeof(int) * partdesc->nparts) == 0)
{
pprune->subpart_map = pinfo->subpart_map;
+ pprune->leafpart_rti_map = pinfo->leafpart_rti_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
}
@@ -2079,6 +2131,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* mismatches.
*/
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->leafpart_rti_map = palloc(sizeof(int) * partdesc->nparts);
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
@@ -2096,6 +2149,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->leafpart_rti_map[pp_idx] =
+ pinfo->leafpart_rti_map[pd_idx];
pd_idx++;
continue;
}
@@ -2133,6 +2188,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map[pp_idx] = -1;
pprune->subplan_map[pp_idx] = -1;
+ pprune->leafpart_rti_map[pp_idx] = 0;
}
}
@@ -2176,6 +2232,25 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
prunestate->execparamids = bms_add_members(prunestate->execparamids,
pinfo->execparamids);
+ /*
+ * Return all leaf partition indexes if we're skipping pruning in
+ * the EXPLAIN (GENERIC_PLAN) case.
+ */
+ if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
+ {
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
+ rtindex);
+ }
+ }
+
j++;
}
i++;
@@ -2442,10 +2517,15 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * The caller must pass a non-NULL validsubplan_rtis during initial pruning
+ * to collect the RT indexes of leaf partitions whose subnodes will be
+ * executed. These RT indexes are later added to EState.es_unpruned_relids.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2457,6 +2537,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* evaluated *and* there are steps in which to do so.
*/
Assert(initial_prune || prunestate->do_exec_prune);
+ Assert(validsubplan_rtis != NULL || !initial_prune);
/*
* Switch to a temp context to avoid leaking memory in the executor's
@@ -2480,7 +2561,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, validsubplan_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_context.initialized)
@@ -2497,6 +2578,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_copy(*validsubplan_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2507,13 +2590,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
+ * of their corresponding leaf partitions to *validsubplan_rtis if
+ * it's non-NULL.
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *partset;
int i;
@@ -2540,8 +2626,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_add_member(*validsubplan_rtis,
+ pprune->leafpart_rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2549,7 +2640,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ validsubplan_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 6aac6f3a872..67926178759 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -771,7 +772,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
* indexed by rangetable index.
*/
void
-ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
+ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids)
{
/* Remember the range table List as-is */
estate->es_range_table = rangeTable;
@@ -782,6 +784,15 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
/* Set size of associated arrays */
estate->es_range_table_size = list_length(rangeTable);
+ /*
+ * Initialize the bitmapset of RT indexes (es_unpruned_relids)
+ * representing relations that will be scanned during execution. This set
+ * is initially populated by the caller and may be extended later by
+ * ExecDoInitialPruning() to include RT indexes of unpruned leaf
+ * partitions.
+ */
+ estate->es_unpruned_relids = unpruned_relids;
+
/*
* Allocate an array to store an open Relation corresponding to each
* rangetable entry, and initialize entries to NULL. Relations are opened
@@ -803,6 +814,10 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed in ExecEndPlan().
+ *
+ * Note: The caller must ensure that 'rti' refers to an unpruned relation
+ * (i.e., it is a member of estate->es_unpruned_relids) before calling this
+ * function. Attempting to open a pruned relation will result in an error.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -811,6 +826,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ if (!bms_is_member(rti, estate->es_unpruned_relids))
+ elog(ERROR, "trying to open a pruned relation");
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 757f8068e21..6aa8e9c4d8a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -864,7 +865,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+ if (!ExecutorStart(es->qd, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 2397e5e17b0..15c4227cc62 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -595,7 +595,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -662,7 +662,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -738,7 +738,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -891,7 +891,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 4e4e3db0b38..a8afbf93b48 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -347,8 +347,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at runtime.
+ * Also ignore the rowmarks belonging to child tables that have been
+ * pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index b2dc6626c99..405e8f94285 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -233,7 +233,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bc82e035ba2..349ed2d6d2c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -690,7 +690,7 @@ ExecInitUpdateProjection(ModifyTableState *mtstate,
Assert(whichrel >= 0 && whichrel < mtstate->mt_nrels);
}
- updateColnos = (List *) list_nth(node->updateColnosLists, whichrel);
+ updateColnos = (List *) list_nth(mtstate->mt_updateColnosLists, whichrel);
/*
* For UPDATE, we use the old tuple to fill up missing values in the tuple
@@ -4453,7 +4453,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ModifyTableState *mtstate;
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
- int nrels = list_length(node->resultRelations);
+ int nrels;
+ List *resultRelations = NIL;
+ List *withCheckOptionLists = NIL;
+ List *returningLists = NIL;
+ List *updateColnosLists = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4463,6 +4467,45 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+ /*
+ * Only consider unpruned relations for initializing their ResultRelInfo
+ * struct and other fields such as withCheckOptions, etc.
+ */
+ i = 0;
+ foreach(l, node->resultRelations)
+ {
+ Index rti = lfirst_int(l);
+
+ if (bms_is_member(rti, estate->es_unpruned_relids))
+ {
+ resultRelations = lappend_int(resultRelations, rti);
+ if (node->withCheckOptionLists)
+ {
+ List *withCheckOptions = list_nth_node(List,
+ node->withCheckOptionLists,
+ i);
+
+ withCheckOptionLists = lappend(withCheckOptionLists, withCheckOptions);
+ }
+ if (node->returningLists)
+ {
+ List *returningList = list_nth_node(List,
+ node->returningLists,
+ i);
+
+ returningLists = lappend(returningLists, returningList);
+ }
+ if (node->updateColnosLists)
+ {
+ List *updateColnosList = list_nth(node->updateColnosLists, i);
+
+ updateColnosLists = lappend(updateColnosLists, updateColnosList);
+ }
+ }
+ i++;
+ }
+ nrels = list_length(resultRelations);
+
/*
* create state structure
*/
@@ -4483,6 +4526,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->mt_merge_inserted = 0;
mtstate->mt_merge_updated = 0;
mtstate->mt_merge_deleted = 0;
+ mtstate->mt_updateColnosLists = updateColnosLists;
/*----------
* Resolve the target relation. This is the same as:
@@ -4500,6 +4544,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
if (node->rootRelation > 0)
{
+ Assert(bms_is_member(node->rootRelation, estate->es_unpruned_relids));
mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
node->rootRelation);
@@ -4514,7 +4559,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
- node->epqParam, node->resultRelations);
+ node->epqParam, resultRelations);
mtstate->fireBSTriggers = true;
/*
@@ -4532,7 +4577,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
resultRelInfo = mtstate->resultRelInfo;
i = 0;
- foreach(l, node->resultRelations)
+ foreach(l, resultRelations)
{
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
@@ -4676,7 +4721,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize any WITH CHECK OPTION constraints if needed.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->withCheckOptionLists)
+ foreach(l, withCheckOptionLists)
{
List *wcoList = (List *) lfirst(l);
List *wcoExprs = NIL;
@@ -4699,7 +4744,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/*
* Initialize RETURNING projections if needed.
*/
- if (node->returningLists)
+ if (returningLists)
{
TupleTableSlot *slot;
ExprContext *econtext;
@@ -4708,7 +4753,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize result tuple slot and assign its rowtype using the first
* RETURNING list. We assume the rest will look the same.
*/
- mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
+ mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
/* Set up a slot for the output of the RETURNING projection(s) */
ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
@@ -4723,7 +4768,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->returningLists)
+ foreach(l, returningLists)
{
List *rlist = (List *) lfirst(l);
@@ -4824,8 +4869,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at runtime.
+ * Also ignore the rowmarks belonging to child tables that have been
+ * pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* Find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index ecb2e4ccaa1..3288396def3 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1685,7 +1686,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2500,6 +2502,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int query_index = 0;
spicallbackarg.query = plansource->query_string;
@@ -2690,14 +2693,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, query_index);
FreeQueryDesc(qdesc);
}
else
@@ -2794,6 +2799,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ query_index++;
}
/* Done with this plan, so release refcount */
@@ -2871,7 +2878,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2927,7 +2935,16 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
ExecutorRun(queryDesc, ForwardScanDirection, tcount);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8a474a50be7..e5e02e86b24 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -557,6 +557,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(glob->allRelids,
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 0868249be94..39a85c30083 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -564,7 +564,8 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
/*
* If it's a plain relation RTE (or a subquery that was once a view
- * reference), add the relation OID to relationOids.
+ * reference), add the relation OID to relationOids. Also add its new RT
+ * index to the set of relations that need to be locked for execution.
*
* We do this even though the RTE might be unreferenced in the plan tree;
* this would correspond to cases such as views that were expanded, child
@@ -576,7 +577,11 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
*/
if (newrte->rtekind == RTE_RELATION ||
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
+ {
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ glob->allRelids = bms_add_member(glob->allRelids,
+ list_length(glob->finalrtable));
+ }
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
@@ -1740,6 +1745,11 @@ set_customscan_references(PlannerInfo *root,
*
* Also update the RT indexes present in PartitionedRelPruneInfos to add the
* offset.
+ *
+ * Finally, if there are initial pruning steps, add the RT indexes of the
+ * leaf partitions to the set of relations that are prunable at execution
+ * startup time. This set indicates which relations should not be locked
+ * before executor startup, as they may be pruned during initial pruning.
*/
static int
register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
@@ -1762,6 +1772,7 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ int i;
prelinfo->rtindex += rtoffset;
prelinfo->initial_pruning_steps =
@@ -1770,6 +1781,22 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
prelinfo->exec_pruning_steps =
fix_scan_list(root, prelinfo->exec_pruning_steps,
rtoffset, 1);
+
+ for (i = 0; i < prelinfo->nparts; i++)
+ {
+ /*
+ * Non-leaf partitions and partitions that do not have a
+ * subplan are not included in this map as mentioned in
+ * make_partitionedrel_pruneinfo().
+ */
+ if (prelinfo->leafpart_rti_map[i])
+ {
+ prelinfo->leafpart_rti_map[i] += rtoffset;
+ if (prelinfo->initial_pruning_steps)
+ glob->prunableRelids = bms_add_member(glob->prunableRelids,
+ prelinfo->leafpart_rti_map[i]);
+ }
+ }
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 4693eef0c58..156995065ca 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -645,6 +645,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ int *leafpart_rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -657,6 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ leafpart_rti_map = (int *) palloc0(nparts * sizeof(int));
present_parts = NULL;
i = -1;
@@ -671,9 +673,28 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of "leaf" partitions so they can be
+ * included in the PlannerGlobal.prunableRelids set, indicating
+ * relations whose locking is deferred until executor startup.
+ *
+ * We don’t defer locking of sub-partitioned partitions because
+ * setting up PartitionedRelPruningData currently occurs before
+ * initial pruning, so the relation must be locked at that stage,
+ * even if it may be pruned.
+ *
+ * Only leaf partitions with a valid subplan that are prunable
+ * using initial pruning are added to prunableRelids. So
+ * partitions without a subplan due to constraint exclusion will
+ * remain in PlannedStmt.unprunableRelids and thus their locking
+ * will not be deferred even if they may ultimately be pruned due
+ * to initial pruning.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ leafpart_rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -695,6 +716,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->leafpart_rti_map = leafpart_rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 334bf3e7aff..dacae08dd97 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -668,7 +668,8 @@ create_edata_for_relation(LogicalRepRelMapEntry *rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 2b7499b34b9..767176c26d4 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -815,7 +815,8 @@ create_estate_for_relation(Relation rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
estate->es_output_cid = GetCurrentCommandId(false);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5655348a2e2..f60f2785bc1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1224,6 +1224,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2025,7 +2026,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6f22496305a..dea24453a6c 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -36,6 +37,9 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +69,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +82,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +128,9 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +143,9 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,14 +157,23 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, 0, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* Run the plan to completion.
@@ -493,6 +514,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -512,9 +534,19 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (portal->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, myeflags,
+ portal->plansource, 0);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, myeflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* This tells PortalCleanup to shut down the executor
@@ -1188,6 +1220,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int query_index = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1269,6 +1302,9 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1278,6 +1314,9 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1342,6 +1381,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ query_index++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 55db8f53705..71839dca108 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,7 +101,8 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ bool release_generic);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -578,10 +579,17 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
+ *
+ * This also releases and drops the generic plan (plansource->gplan), if any,
+ * as most callers will typically build a new CachedPlan for the plansource
+ * right after this. However, when called from UpdateCachedPlan(), the
+ * function does not release the generic plan, as UpdateCachedPlan() updates
+ * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv)
+ QueryEnvironment *queryEnv,
+ bool release_generic)
{
bool snapshot_set;
RawStmt *rawtree;
@@ -678,8 +686,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference if any */
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference, if any, and if requested */
+ if (release_generic)
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -815,8 +824,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. However, the plans are not fully
+ * race-condition-free until the executor acquires locks on the prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -870,7 +882,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
- /* Successfully revalidated and locked the query. */
+ /*
+ * Successfully revalidated and locked the query. Set is_reused
+ * to true so that CachedPlanRequiresLocking() returns true.
+ */
+ plan->is_reused = true;
return true;
}
@@ -895,12 +911,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* To build a generic, parameter-value-independent plan, pass NULL for
* boundParams. To build a custom plan, pass the actual parameter values via
* boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * each parameter value; otherwise the planner will treat the value as a hint
+ * rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -911,6 +929,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
+ MemoryContext stmt_context = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -928,7 +947,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/*
* If we don't already have a copy of the querytree list that can be
@@ -967,10 +986,19 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
PopActiveSnapshot();
/*
- * Normally we make a dedicated memory context for the CachedPlan and its
- * subsidiary data. (It's probably not going to be large, but just in
- * case, allow it to grow large. It's transient for the moment.) But for
- * a one-shot plan, we just leave it in the caller's memory context.
+ * Normally, we create a dedicated memory context for the CachedPlan and
+ * its subsidiary data. Although it's usually not very large, the context
+ * is designed to allow growth if necessary.
+ *
+ * The PlannedStmts are stored in a separate child context (stmt_context)
+ * of the CachedPlan's memory context. This separation allows
+ * UpdateCachedPlan() to free and replace the PlannedStmts without
+ * affecting the CachedPlan structure or its stmt_list List.
+ *
+ * For one-shot plans, we instead use the caller's memory context, as the
+ * CachedPlan will not persist. stmt_context will be set to NULL in this
+ * case, because UpdateCachedPlan() should never get called on a one-shot
+ * plan.
*/
if (!plansource->is_oneshot)
{
@@ -979,12 +1007,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- /*
- * Copy plan into the new context.
- */
- MemoryContextSwitchTo(plan_context);
+ stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan PlannedStmts",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
+ MemoryContextSetParent(stmt_context, plan_context);
+ MemoryContextSwitchTo(stmt_context);
plist = copyObject(plist);
+
+ MemoryContextSwitchTo(plan_context);
+ plist = list_copy(plist);
}
else
plan_context = CurrentMemoryContext;
@@ -1025,8 +1058,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->saved_xmin = InvalidTransactionId;
plan->refcount = 0;
plan->context = plan_context;
+ plan->stmt_context = stmt_context;
plan->is_oneshot = plansource->is_oneshot;
plan->is_saved = false;
+ plan->is_reused = false;
plan->is_valid = true;
/* assign generation number to new plan */
@@ -1153,8 +1188,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid, but not all locks are acquired if the
+ * returned plan is a reused generic plan. In such cases, locks on relations
+ * subject to initial runtime pruning are not taken by CheckCachedPlan() but
+ * deferred until the execution startup phase, specifically when
+ * ExecDoInitialPruning() performs initial pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1180,7 +1218,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1276,6 +1314,113 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * UpdateCachedPlan
+ * Create fresh plans for all queries in the CachedPlanSource, replacing
+ * those in the generic plan's stmt_list, and return the plan for the
+ * query_index'th query.
+ *
+ * This function is primarily used by ExecutorStartCachedPlan() to handle
+ * cases where the original generic CachedPlan becomes invalid. Such
+ * invalidation may occur when prunable relations in the old plan for the
+ * query_index'th query are locked in preparation for execution.
+ *
+ * Note that invalidations received during the execution of the query_index'th
+ * query can affect both the queries that have already finished execution
+ * (e.g., due to concurrent modifications on prunable relations that were not
+ * locked during their execution) and also the queries that have not yet been
+ * executed. As a result, this function updates all plans to ensure
+ * CachedPlan.is_valid is safely set to true.
+ *
+ * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
+ * the caller and any of its callers must not rely on them remaining accessible
+ * after this function is called.
+ */
+PlannedStmt *
+UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ ListCell *l1,
+ *l2;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
+ else if (plan->is_oneshot)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan() returned
+ * the CachedPlan. See the comment in BuildCachedPlan() for details on why
+ * this might happen. Although invalidation is likely a false positive as
+ * stated there, we make the plan valid to ensure the query list used for
+ * planning is up to date.
+ *
+ * The risk of catching an invalidation is higher here than when
+ * BuildCachedPlan() is called from GetCachedPlan(), because this function
+ * is normally called long after GetCachedPlan() returns the CachedPlan,
+ * so much more processing could have occurred including things that mark
+ * the CachedPlanSource invalid.
+ *
+ * Note: Do not release plansource->gplan, because the upstream callers
+ * (such as the callers of ExecutorStartCachedPlan()) would still be
+ * referencing it.
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv, false);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for all the queries after making a copy to be
+ * scribbled on by the planner.
+ */
+ query_list = copyObject(query_list);
+
+ /*
+ * Planning work is done in the caller's memory context. The resulting
+ * PlannedStmt is then copied into plan->stmt_context after throwing away
+ * the old ones.
+ */
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+ Assert(list_length(plan_list) == list_length(plan->stmt_list));
+
+ MemoryContextReset(plan->stmt_context);
+ oldcxt = MemoryContextSwitchTo(plan->stmt_context);
+ forboth(l1, plan_list, l2, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst(l1);
+
+ lfirst(l2) = copyObject(plannedstmt);
+ }
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * XXX Should this also (re)set the properties of the CachedPlan that are
+ * set in BuildCachedPlan() after creating the fresh plans such as
+ * planRoleId, dependsOnRole, and save_xmin?
+ */
+
+ /*
+ * We've updated all the plans that might have been invalidated, so mark
+ * the CachedPlan as valid.
+ */
+ plan->is_valid = true;
+
+ /* Also update generic_cost because we just created a new generic plan. */
+ plansource->generic_cost = cached_plan_cost(plan, false);
+
+ return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1654,7 +1799,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv);
+ RevalidateCachedQuery(plansource, queryEnv, true);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -1776,7 +1921,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,13 +1939,16 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 0be1c2b0fff..e3526e78064 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f4..570e7cad1fa 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,10 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 2ed2c4bb378..4180601dcd4 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 855fed4fea5..951009cf46c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -48,6 +48,8 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * leafpart_rti_map RT index by partition index, or 0 if not a leaf
+ * partition.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -65,6 +67,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ int *leafpart_rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -135,6 +138,7 @@ extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..ba53305ad42 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c7db6defd3e..c055b4436bc 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -72,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -191,8 +192,11 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -255,6 +259,30 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
+
+/*
+ * Locks are needed only if running a cached plan that might contain unlocked
+ * relations, such as a reused generic plan.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -595,7 +623,8 @@ extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
-extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
+extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids);
extern void ExecCloseRangeTableRelations(EState *estate);
extern void ExecCloseResultRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index aca15f771a2..9519dca374b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -655,9 +656,14 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
+ Bitmapset *es_unpruned_relids; /* PlannedStmt.unprunableRelids + RT
+ * indexes of leaf partitions that survive
+ * initial pruning; see
+ * ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -703,6 +709,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
@@ -1440,6 +1447,12 @@ typedef struct ModifyTableState
double mt_merge_inserted;
double mt_merge_updated;
double mt_merge_deleted;
+
+ /*
+ * List of valid updateColnosLists. Contains only those belonging to
+ * unpruned relations from ModifyTable.updateColnosLists.
+ */
+ List *mt_updateColnosLists;
} ModifyTableState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 52d44f43021..2fe5179ca77 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,14 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of all relation RTEs in finalrtable (RTE_RELATION and
+ * RTE_SUBQUERY RTEs of views) and of those that are subject to runtime
+ * pruning at plan initialization time ("initial" pruning).
+ */
+ Bitmapset *allRelids;
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 07905d89b8a..8472f0564e3 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; set for
+ * AcquireExecutorLocks(). */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1475,6 +1479,9 @@ typedef struct PartitionedRelPruneInfo
/* subpart index by partition index, or -1 */
int *subpart_map pg_node_attr(array_size(nparts));
+ /* RT index by partition index, or 0 if not a leaf partition */
+ int *leafpart_rti_map pg_node_attr(array_size(nparts));
+
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 46072d311b1..2d83f7d4930 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,8 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
+#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -139,10 +141,11 @@ typedef struct CachedPlanSource
* The reference count includes both the link from the parent CachedPlanSource
* (if any), and any active plan executions, so the plan can be discarded
* exactly when refcount goes to zero. Both the struct itself and the
- * subsidiary data live in the context denoted by the context field.
- * This makes it easy to free a no-longer-needed cached plan. (However,
- * if is_oneshot is true, the context does not belong solely to the CachedPlan
- * so no freeing is possible.)
+ * subsidiary data, except the PlannedStmts in stmt_list live in the context
+ * denoted by the context field; the PlannedStmts live in the context denoted
+ * by stmt_context. Separate contexts makes it easy to free a no-longer-needed
+ * cached plan. (However, if is_oneshot is true, the context does not belong
+ * solely to the CachedPlan so no freeing is possible.)
*/
typedef struct CachedPlan
{
@@ -150,6 +153,7 @@ typedef struct CachedPlan
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
+ bool is_reused; /* is it a reused generic plan? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
@@ -158,6 +162,10 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext stmt_context; /* context containing the PlannedStmts in
+ * stmt_list, but not the List itself which is
+ * in the above context; NULL if is_oneshot is
+ * true. */
} CachedPlan;
/*
@@ -223,6 +231,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -235,4 +247,34 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire additional locks?
+ *
+ * If the plan is a saved generic plan, the executor must acquire locks for
+ * relations that are not covered by AcquireExecutorLocks(), such as partitions
+ * that are subject to initial runtime pruning.
+ *
+ * Note: These locks are unnecessary if the plan is executed immediately after
+ * its creation, since the planner would have already acquired them. However,
+ * we do not optimize for that case.
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return !cplan->is_oneshot && cplan->is_reused;
+}
+
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 0b62143af8b..ddee031f551 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -240,7 +241,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846da..3eeb097fde4 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7bc97f84a1c..844af6bd061 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2025, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/fmgrprotos.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 00000000000..5bfb2b33b39
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,282 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(56 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(41 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Index Scan using foo12_1_a on foo12_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(9 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Disabled: true
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(10 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index b53488f76d2..58159bfc574 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 00000000000..f27e8fb521c
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,80 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE TABLE bar (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE bar1 PARTITION OF bar FOR VALUES IN (1);
+ CREATE INDEX ON bar1(a);
+ CREATE TABLE bar2 PARTITION OF bar FOR VALUES IN (2);
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO UPDATE bar SET a = a WHERE a = one();
+ CREATE RULE update_bar AS ON UPDATE TO bar DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo, bar;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Another case with Append with run-time pruning in a subquery
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index f0707e7f7ea..7c0c40117ae 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4469,3 +4469,47 @@ drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (1, 'd');
+ a | b | c
+---+---+---
+ 1 | d | t
+(1 row)
+
+explain (costs off) execute update_part_abc_view (2, 'a');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_2 part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_2 part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (2, 'a');
+ERROR: new row violates check option for view "part_abc_view"
+DETAIL: Failing row contains (2, a, t).
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ea9a4fe4a23..06620640f87 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1354,3 +1354,21 @@ drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+execute update_part_abc_view (1, 'd');
+explain (costs off) execute update_part_abc_view (2, 'a');
+execute update_part_abc_view (2, 'a');
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-01-31 08:31 ` Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-01-31 08:31 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Thu, Jan 23, 2025 at 4:15 PM Amit Langote <[email protected]> wrote:
> I’ve rebased over recent changes to setrefs.c (commit bf826ea0629).
> During the rebase, I realized that the patch
> 0002-Initialize-PartitionPruneContexts-lazily wasn’t a good idea after
> all.
>
> The test case added by bf826ea0629 highlighted an issue: initializing
> pruning expressions lazily during execution could leave the
> Append/MergeAppend node’s PlanState.subPlan uninitialized at
> ExecInitNode() time. Initially, I thought this would have only
> cosmetic consequences -- such as changes in test case output where
> SubPlans referenced in "exec" pruning expressions wouldn’t appear --
> but I may have underestimated the problem. As a result, I’ve abandoned
> that approach and the patch in favor of initializing all pruning
> expressions during plan initialization.
>
> Additionally, I revisited the impact of the main patch on
> ExecutorStart_hooks. It seems better to change the return type from
> void to bool, returning the result of
> ExecPlanStillValid(queryDesc->estate). This change has the added
> benefit of breaking extensions that use ExecutorStart_hook at compile
> time, encouraging authors to update their code. The updated commit
> message includes details on additional checks extensions must
> implement, particularly for cases where they might access pruned and
> thus unlocked relations.
>
> I've stared at the refactoring patches 0001 and 0002 for long enough
> at this point that I'd like to commit them early next week, barring
> further comments or objections. I'll keep staring at 0003.
I have now pushed 0001 and 0002.
I broke 0003 into two patches:
Patch to track unpruned relations in the executor, allowing the
overhead of processing pruned partitions to be skipped during plan
initialization. This is particularly relevant for top-level nodes such
as ModifyTable and LockRows, which -- unlike Append / MergeAppend --
do not ignore initially pruned partitions. Since initial pruning is
now performed separately from plan initialization and earlier in
InitPlan(), we can fix this by checking whether a given child result
relation or RowMark belongs to a pruned partition and skipping it.
Patch to defer locking of prunable relations from GetCachedPlan() to
InitPlan(), preventing partitions pruned by initial pruning from being
locked unnecessarily.
With the attached 0001, I can see that saving the overhead of
initializing ResultRelInfos for pruned partitions in
ExecInitModifyTable() results in a noticeable speedup for pgbench
-Mprepared with partitions, especially at higher partition counts
where the overhead is more significant. The numbers I have here are a
bit noisy, but they provide a general idea of the performance benefit
of skipping initially pruned partitions during plan initialization.
Setup:
plan_cache_mode = force_generic_plan
max_locks_per_transaction = 1000
for i in 100 200 500 1000 2000; do
echo -ne "$i\t"
pgbench -i --partitions=$i > /dev/null 2>&1;
pgbench -n -Mprepared -T 10 | grep tps;
done
With master:
100 tps = 2837.095192 (without initial connection time)
200 tps = 2614.143255 (without initial connection time)
500 tps = 1960.666074 (without initial connection time)
1000 tps = 1390.691229 (without initial connection time)
2000 tps = 884.882656 (without initial connection time)
With 0001:
100 tps = 2889.600827 (without initial connection time)
200 tps = 2720.895632 (without initial connection time)
500 tps = 2096.177756 (without initial connection time)
1000 tps = 1659.265873 (without initial connection time)
2000 tps = 1148.976177 (without initial connection time)
With 0002:
100 tps = 3070.137629 (without initial connection time)
200 tps = 4589.336857 (without initial connection time)
500 tps = 2977.339119 (without initial connection time)
1000 tps = 2885.417560 (without initial connection time)
2000 tps = 3832.111167 (without initial connection time)
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v61-0001-Track-unpruned-relids-to-avoid-processing-pruned.patch (40.4K, 2-v61-0001-Track-unpruned-relids-to-avoid-processing-pruned.patch)
download | inline diff:
From 970c03dbc587bafc7ccb5769e81aa1496ad92319 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Fri, 31 Jan 2025 15:58:18 +0900
Subject: [PATCH v61 1/2] Track unpruned relids to avoid processing pruned
relations
This commit introduces changes to track unpruned relations explicitly,
ensuring that top-level plan nodes, such as ModifyTable and LockRows,
do not process partitions pruned with "initial" pruning. Scan-level
nodes, such as Append, already handle this correctly by processing
only unpruned relations.
The executor introduces a new es_unpruned_relids field in EState,
which tracks the set of unpruned relations at plan initialization
to ensure only unpruned relations are processed during execution.
This field is initialized with PlannedStmt.unprunableRelids, a new
field that tracks relations that cannot be pruned during runtime
pruning. These include relations not subject to partition pruning and
those required for execution regardless of pruning.
ExecDoInitialPruning() updates es_unpruned_relids by adding partitions
that survive initial pruning.
PlannedStmt.unprunableRelids is computed during set_plan_refs() by
removing the RT indexes of runtime-prunable relations, identified
from PartitionPruneInfos, from the full set of relation RT indexes.
To support this, PartitionedRelPruneInfo and PartitionedRelPruningData
now include a leafpart_rti_map[] array that maps partition indexes
from get_matching_partitions() to their corresponding RT indexes. This
mapping is used in ExecDoInitialPruning() to convert partition indexes
into RT indexes, which are then added to es_unpruned_relids.
These changes ensure that top-level plan nodes, such as ModifyTable
and LockRows, process only relations that remain unpruned after
initial pruning. ExecInitModifyTable() trims lists, such as
resultRelations, withCheckOptionLists, returningLists, and
updateColnosLists, to include only unpruned partitions and creates
ResultRelInfo structs only for these partitions. Similarly, child
RowMarks for pruned relations are skipped. By avoiding unnecessary
initialization of structures for pruned partitions, these changes
improve the performance of updates and deletes on partitioned tables
with initial runtime pruning.
Due to ExecInitModifyTable() changes as described above, EXPLAIN
on a plan for UPDATE and DELETE that can use runtime initial pruning
no longer shows the "initially" pruned partitions in the list of
relations to be updated or deleted from.
Reviewed-by: Robert Haas (earlier versions)
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/commands/copyfrom.c | 3 +-
src/backend/executor/execMain.c | 19 ++++-
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 82 ++++++++++++++++---
src/backend/executor/execUtils.c | 19 ++++-
src/backend/executor/nodeAppend.c | 8 +-
src/backend/executor/nodeLockRows.c | 9 +-
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/nodeModifyTable.c | 70 +++++++++++++---
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 29 ++++++-
src/backend/partitioning/partprune.c | 15 ++++
src/backend/replication/logical/worker.c | 3 +-
src/backend/replication/pgoutput/pgoutput.c | 3 +-
src/include/executor/execPartition.h | 6 +-
src/include/executor/executor.h | 3 +-
src/include/nodes/execnodes.h | 10 +++
src/include/nodes/pathnodes.h | 8 ++
src/include/nodes/plannodes.h | 7 ++
src/test/regress/expected/partition_prune.out | 44 ++++++++++
src/test/regress/sql/partition_prune.sql | 18 ++++
21 files changed, 322 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f5602..da1e8ddc5a1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -768,7 +768,8 @@ CopyFrom(CopyFromState cstate)
* index-entry-making machinery. (There used to be a huge amount of code
* here that basically duplicated execUtils.c ...)
*/
- ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos);
+ ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos,
+ bms_make_singleton(1));
resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, resultRelInfo, 1);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 604cb0625b8..5b989074203 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,7 +851,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
/*
* initialize the node's execution state
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
+ bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
@@ -881,8 +882,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Relation relation;
ExecRowMark *erm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* get relation's OID (will produce InvalidOid if subquery) */
@@ -2933,6 +2939,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
}
}
+ /*
+ * Copy es_unpruned_relids so that RowMarks of pruned relations are
+ * ignored in ExecInitLockRows() and ExecInitModifyTable() when
+ * initializing the plan trees below.
+ */
+ rcestate->es_unpruned_relids = parentestate->es_unpruned_relids;
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9c313d81315..134ff62f5cb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
+ pstmt->unprunableRelids = estate->es_unpruned_relids;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4e9c32cef16..7ada26a541c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,7 +182,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -196,7 +197,8 @@ static void InitExecPartitionPruneContexts(PartitionPruneState *prunstate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis);
/*
@@ -1820,9 +1822,12 @@ ExecDoInitialPruning(EState *estate)
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
+ Bitmapset *all_leafpart_rtis = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo,
+ &all_leafpart_rtis);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
@@ -1831,7 +1836,13 @@ ExecDoInitialPruning(EState *estate)
* bitmapset or NULL as described in the header comment.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ else
+ validsubplan_rtis = all_leafpart_rtis;
+
+ estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
+ validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
@@ -1944,9 +1955,15 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* initialized here. Those required for exec pruning are initialized later in
* ExecInitPartitionExecPruning(), as they depend on the availability of the
* parent plan node's PlanState.
+ *
+ * On return, *all_leafpart_rtis will contain the RT indexes of all leaf
+ * partitions if initial pruning steps are skipped (e.g., during EXPLAIN
+ * (GENERIC_PLAN)). The caller is responsible for adding these RT indexes
+ * to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2039,8 +2056,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* The set of partitions that exist now might not be the same that
* existed when the plan was made. The normal case is that it is;
* optimize for that case with a quick comparison, and just copy
- * the subplan_map and make subpart_map point to the one in
- * PruneInfo.
+ * the subplan_map and make subpart_map, leafpart_rti_map point to
+ * the ones in PruneInfo.
*
* For the case where they aren't identical, we could have more
* partitions on either side; or even exactly the same number of
@@ -2059,6 +2076,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
sizeof(int) * partdesc->nparts) == 0)
{
pprune->subpart_map = pinfo->subpart_map;
+ pprune->leafpart_rti_map = pinfo->leafpart_rti_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
}
@@ -2079,6 +2097,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* mismatches.
*/
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->leafpart_rti_map = palloc(sizeof(int) * partdesc->nparts);
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
@@ -2096,6 +2115,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->leafpart_rti_map[pp_idx] =
+ pinfo->leafpart_rti_map[pd_idx];
pd_idx++;
continue;
}
@@ -2133,6 +2154,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map[pp_idx] = -1;
pprune->subplan_map[pp_idx] = -1;
+ pprune->leafpart_rti_map[pp_idx] = 0;
}
}
@@ -2174,6 +2196,25 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
prunestate->execparamids = bms_add_members(prunestate->execparamids,
pinfo->execparamids);
+ /*
+ * Return all leaf partition indexes if we're skipping pruning in
+ * the EXPLAIN (GENERIC_PLAN) case.
+ */
+ if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
+ {
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
+ rtindex);
+ }
+ }
+
j++;
}
i++;
@@ -2439,10 +2480,15 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * The caller must pass a non-NULL validsubplan_rtis during initial pruning
+ * to collect the RT indexes of leaf partitions whose subnodes will be
+ * executed. These RT indexes are later added to EState.es_unpruned_relids.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2454,6 +2500,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* evaluated *and* there are steps in which to do so.
*/
Assert(initial_prune || prunestate->do_exec_prune);
+ Assert(validsubplan_rtis != NULL || !initial_prune);
/*
* Switch to a temp context to avoid leaking memory in the executor's
@@ -2477,7 +2524,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, validsubplan_rtis);
/*
* Expression eval may have used space in ExprContext too.
@@ -2495,6 +2542,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_copy(*validsubplan_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2505,13 +2554,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
+ * of their corresponding leaf partitions to *validsubplan_rtis if
+ * it's non-NULL.
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *partset;
int i;
@@ -2538,8 +2590,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_add_member(*validsubplan_rtis,
+ pprune->leafpart_rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2547,7 +2604,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ validsubplan_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 6aac6f3a872..1792c40a5cb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -771,7 +771,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
* indexed by rangetable index.
*/
void
-ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
+ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids)
{
/* Remember the range table List as-is */
estate->es_range_table = rangeTable;
@@ -782,6 +783,15 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
/* Set size of associated arrays */
estate->es_range_table_size = list_length(rangeTable);
+ /*
+ * Initialize the bitmapset of RT indexes (es_unpruned_relids)
+ * representing relations that will be scanned during execution. This set
+ * is initially populated by the caller and may be extended later by
+ * ExecDoInitialPruning() to include RT indexes of unpruned leaf
+ * partitions.
+ */
+ estate->es_unpruned_relids = unpruned_relids;
+
/*
* Allocate an array to store an open Relation corresponding to each
* rangetable entry, and initialize entries to NULL. Relations are opened
@@ -803,6 +813,10 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed in ExecEndPlan().
+ *
+ * Note: The caller must ensure that 'rti' refers to an unpruned relation
+ * (i.e., it is a member of estate->es_unpruned_relids) before calling this
+ * function. Attempting to open a pruned relation will result in an error.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -811,6 +825,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ if (!bms_is_member(rti, estate->es_unpruned_relids))
+ elog(ERROR, "trying to open a pruned relation");
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 2397e5e17b0..15c4227cc62 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -595,7 +595,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -662,7 +662,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -738,7 +738,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -891,7 +891,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 4e4e3db0b38..a8afbf93b48 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -347,8 +347,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at runtime.
+ * Also ignore the rowmarks belonging to child tables that have been
+ * pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index b2dc6626c99..405e8f94285 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -233,7 +233,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bc82e035ba2..349ed2d6d2c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -690,7 +690,7 @@ ExecInitUpdateProjection(ModifyTableState *mtstate,
Assert(whichrel >= 0 && whichrel < mtstate->mt_nrels);
}
- updateColnos = (List *) list_nth(node->updateColnosLists, whichrel);
+ updateColnos = (List *) list_nth(mtstate->mt_updateColnosLists, whichrel);
/*
* For UPDATE, we use the old tuple to fill up missing values in the tuple
@@ -4453,7 +4453,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ModifyTableState *mtstate;
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
- int nrels = list_length(node->resultRelations);
+ int nrels;
+ List *resultRelations = NIL;
+ List *withCheckOptionLists = NIL;
+ List *returningLists = NIL;
+ List *updateColnosLists = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4463,6 +4467,45 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+ /*
+ * Only consider unpruned relations for initializing their ResultRelInfo
+ * struct and other fields such as withCheckOptions, etc.
+ */
+ i = 0;
+ foreach(l, node->resultRelations)
+ {
+ Index rti = lfirst_int(l);
+
+ if (bms_is_member(rti, estate->es_unpruned_relids))
+ {
+ resultRelations = lappend_int(resultRelations, rti);
+ if (node->withCheckOptionLists)
+ {
+ List *withCheckOptions = list_nth_node(List,
+ node->withCheckOptionLists,
+ i);
+
+ withCheckOptionLists = lappend(withCheckOptionLists, withCheckOptions);
+ }
+ if (node->returningLists)
+ {
+ List *returningList = list_nth_node(List,
+ node->returningLists,
+ i);
+
+ returningLists = lappend(returningLists, returningList);
+ }
+ if (node->updateColnosLists)
+ {
+ List *updateColnosList = list_nth(node->updateColnosLists, i);
+
+ updateColnosLists = lappend(updateColnosLists, updateColnosList);
+ }
+ }
+ i++;
+ }
+ nrels = list_length(resultRelations);
+
/*
* create state structure
*/
@@ -4483,6 +4526,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->mt_merge_inserted = 0;
mtstate->mt_merge_updated = 0;
mtstate->mt_merge_deleted = 0;
+ mtstate->mt_updateColnosLists = updateColnosLists;
/*----------
* Resolve the target relation. This is the same as:
@@ -4500,6 +4544,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
if (node->rootRelation > 0)
{
+ Assert(bms_is_member(node->rootRelation, estate->es_unpruned_relids));
mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
node->rootRelation);
@@ -4514,7 +4559,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
- node->epqParam, node->resultRelations);
+ node->epqParam, resultRelations);
mtstate->fireBSTriggers = true;
/*
@@ -4532,7 +4577,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
resultRelInfo = mtstate->resultRelInfo;
i = 0;
- foreach(l, node->resultRelations)
+ foreach(l, resultRelations)
{
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
@@ -4676,7 +4721,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize any WITH CHECK OPTION constraints if needed.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->withCheckOptionLists)
+ foreach(l, withCheckOptionLists)
{
List *wcoList = (List *) lfirst(l);
List *wcoExprs = NIL;
@@ -4699,7 +4744,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/*
* Initialize RETURNING projections if needed.
*/
- if (node->returningLists)
+ if (returningLists)
{
TupleTableSlot *slot;
ExprContext *econtext;
@@ -4708,7 +4753,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize result tuple slot and assign its rowtype using the first
* RETURNING list. We assume the rest will look the same.
*/
- mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
+ mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
/* Set up a slot for the output of the RETURNING projection(s) */
ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
@@ -4723,7 +4768,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->returningLists)
+ foreach(l, returningLists)
{
List *rlist = (List *) lfirst(l);
@@ -4824,8 +4869,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at runtime.
+ * Also ignore the rowmarks belonging to child tables that have been
+ * pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* Find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8a474a50be7..e5e02e86b24 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -557,6 +557,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(glob->allRelids,
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 0868249be94..999a5a8ab5a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -564,7 +564,9 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
/*
* If it's a plain relation RTE (or a subquery that was once a view
- * reference), add the relation OID to relationOids.
+ * reference), add the relation OID to relationOids. Also add its new RT
+ * index to the set of relations to be potentially accessed during
+ * execution.
*
* We do this even though the RTE might be unreferenced in the plan tree;
* this would correspond to cases such as views that were expanded, child
@@ -576,7 +578,11 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
*/
if (newrte->rtekind == RTE_RELATION ||
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
+ {
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ glob->allRelids = bms_add_member(glob->allRelids,
+ list_length(glob->finalrtable));
+ }
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
@@ -1740,6 +1746,10 @@ set_customscan_references(PlannerInfo *root,
*
* Also update the RT indexes present in PartitionedRelPruneInfos to add the
* offset.
+ *
+ * Finally, if there are initial pruning steps, add the RT indexes of the
+ * leaf partitions to the set of relations that are prunable at execution
+ * startup time.
*/
static int
register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
@@ -1762,6 +1772,7 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ int i;
prelinfo->rtindex += rtoffset;
prelinfo->initial_pruning_steps =
@@ -1770,6 +1781,22 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
prelinfo->exec_pruning_steps =
fix_scan_list(root, prelinfo->exec_pruning_steps,
rtoffset, 1);
+
+ for (i = 0; i < prelinfo->nparts; i++)
+ {
+ /*
+ * Non-leaf partitions and partitions that do not have a
+ * subplan are not included in this map as mentioned in
+ * make_partitionedrel_pruneinfo().
+ */
+ if (prelinfo->leafpart_rti_map[i])
+ {
+ prelinfo->leafpart_rti_map[i] += rtoffset;
+ if (prelinfo->initial_pruning_steps)
+ glob->prunableRelids = bms_add_member(glob->prunableRelids,
+ prelinfo->leafpart_rti_map[i]);
+ }
+ }
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 4693eef0c58..ff926732f36 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -645,6 +645,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ int *leafpart_rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -657,6 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ leafpart_rti_map = (int *) palloc0(nparts * sizeof(int));
present_parts = NULL;
i = -1;
@@ -671,9 +673,21 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of "leaf" partitions so they can be
+ * included in the PlannerGlobal.prunableRelids set, indicating
+ * relations that may be pruned during executor startup.
+ *
+ * Only leaf partitions with a valid subplan that are prunable
+ * using initial pruning are added to prunableRelids. So
+ * partitions without a subplan due to constraint exclusion will
+ * remain in PlannedStmt.unprunableRelids.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ leafpart_rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -695,6 +709,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->leafpart_rti_map = leafpart_rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 6966037d2ef..f09ab41c605 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -668,7 +668,8 @@ create_edata_for_relation(LogicalRepRelMapEntry *rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 0227fcbca3d..2f89996a757 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -820,7 +820,8 @@ create_estate_for_relation(Relation rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
estate->es_output_cid = GetCurrentCommandId(false);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 855fed4fea5..951009cf46c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -48,6 +48,8 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * leafpart_rti_map RT index by partition index, or 0 if not a leaf
+ * partition.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -65,6 +67,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ int *leafpart_rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -135,6 +138,7 @@ extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c7db6defd3e..e9ccc438cdd 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -595,7 +595,8 @@ extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
-extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
+extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids);
extern void ExecCloseRangeTableRelations(EState *estate);
extern void ExecCloseResultRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index aca15f771a2..a2cba97e3d5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -658,6 +658,10 @@ typedef struct EState
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
+ Bitmapset *es_unpruned_relids; /* PlannedStmt.unprunableRelids + RT
+ * indexes of leaf partitions that survive
+ * initial pruning; see
+ * ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -1440,6 +1444,12 @@ typedef struct ModifyTableState
double mt_merge_inserted;
double mt_merge_updated;
double mt_merge_deleted;
+
+ /*
+ * List of valid updateColnosLists. Contains only those belonging to
+ * unpruned relations from ModifyTable.updateColnosLists.
+ */
+ List *mt_updateColnosLists;
} ModifyTableState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 52d44f43021..2fe5179ca77 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,14 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of all relation RTEs in finalrtable (RTE_RELATION and
+ * RTE_SUBQUERY RTEs of views) and of those that are subject to runtime
+ * pruning at plan initialization time ("initial" pruning).
+ */
+ Bitmapset *allRelids;
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 06d9559ebb9..4abefa7bec0 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; set for
+ * AcquireExecutorLocks(). */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1483,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* subpart index by partition index, or -1 */
int *subpart_map pg_node_attr(array_size(nparts));
+ /* RT index by partition index, or 0 if not a leaf partition */
+ int *leafpart_rti_map pg_node_attr(array_size(nparts));
+
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index f0707e7f7ea..7c0c40117ae 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4469,3 +4469,47 @@ drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (1, 'd');
+ a | b | c
+---+---+---
+ 1 | d | t
+(1 row)
+
+explain (costs off) execute update_part_abc_view (2, 'a');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_2 part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_2 part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (2, 'a');
+ERROR: new row violates check option for view "part_abc_view"
+DETAIL: Failing row contains (2, a, t).
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ea9a4fe4a23..06620640f87 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1354,3 +1354,21 @@ drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+execute update_part_abc_view (1, 'd');
+explain (costs off) execute update_part_abc_view (2, 'a');
+execute update_part_abc_view (2, 'a');
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
--
2.43.0
[application/octet-stream] v61-0002-Defer-locking-of-runtime-prunable-relations-in-c.patch (87.8K, 3-v61-0002-Defer-locking-of-runtime-prunable-relations-in-c.patch)
download | inline diff:
From b87d47c1a62f8ef3e4bbee19e05829811f813226 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 28 Jan 2025 20:03:03 +0900
Subject: [PATCH v61 2/2] Defer locking of runtime-prunable relations in cached
plans
AcquireExecutorLocks() in plancache.c locks all relations in a plan's
range table to ensure the plan is safe for execution. However, this
locks runtime-prunable relations that will later be pruned during
"initial" runtime pruning, introducing unnecessary overhead. This
commit defers locking for such relations and ensures that any
invalidation caused by this deferral triggers replanning when needed.
AcquireExecutorLocks() now locks only unprunable relations to avoid
locking runtime-prunable partitions unnecessarily. This deferral of
locks ensures that runtime-prunable relations are handled later during
executor startup, minimizing overhead and reducing contention in
workloads involving partitioned tables.
This results in significant speedups for generic plans with many
runtime-prunable partitions.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks are properly locked.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and
handle invalidation caused by deferred locking. If invalidation
occurs, ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the refreshed
plan.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A
new CachedPlan.stmt_context, as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and statements list.
ExecutorStart() and ExecutorStart_hook now return a boolean value
indicating whether plan initialization succeeded with a valid
PlanState tree in QueryDesc.planstate.
* Testing:
The delay_execution module tests scenarios where cached plans become
invalid due to changes in prunable relations after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(). For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions inspecting RT indexes must ensure the relation is locked
before processing it. For example, see how InitPlan() processes
PlannedStmt.rowMarks.
Reviewed-by: Robert Haas (earlier versions)
Reviewed-by: David Rowley (earlier versions)
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 16 +-
.../pg_stat_statements/pg_stat_statements.c | 16 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 5 +-
src/backend/commands/explain.c | 22 +-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 5 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/commands/trigger.c | 14 +
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 130 +++++++-
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 38 ++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 4 +-
src/backend/executor/spi.c | 29 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 51 +++-
src/backend/utils/cache/plancache.c | 204 +++++++++++--
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 6 +-
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 34 ++-
src/include/nodes/execnodes.h | 3 +
src/include/utils/plancache.h | 50 +++-
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 ++++-
.../expected/cached-plan-inval.out | 282 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 80 +++++
33 files changed, 1043 insertions(+), 99 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f1ad876e821..82c17c0a28a 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -76,7 +76,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -256,9 +256,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -294,9 +296,13 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
if (auto_explain_enabled())
{
@@ -314,6 +320,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb0..b735381cb0b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -332,7 +332,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -986,13 +986,19 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1015,6 +1021,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..091fbc12cc5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -556,7 +556,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -566,7 +566,8 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- ExecutorStart(cstate->queryDesc, 0);
+ if (!ExecutorStart(cstate->queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 23cecd99c9e..44b4665ccd3 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -332,12 +332,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ if (!ExecutorStart(queryDesc, GetIntoRelEFlags(into)))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e1..af25c16d215 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -519,7 +519,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -641,7 +642,9 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -697,7 +700,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -711,8 +714,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Prepare the plan for execution. */
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index ba540e3de5b..1b28d20412e 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -907,11 +907,13 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ if (!ExecutorStart(qdesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
ExecutorRun(qdesc, ForwardScanDirection, 0);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c12817091ed..0bfbc5ca6dc 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,12 +438,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index e7c8171c102..4c2ac045224 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -117,6 +117,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 8989c0c882d..c025b1f9f8c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -582,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int query_index = 0;
if (es->memory)
{
@@ -654,7 +656,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -665,6 +668,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ query_index++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7a5ffe32f60..c40bb0fdc9f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5143,6 +5143,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be613..449c6068ae9 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+flag. If the plan tree is outdated (is_valid = false), the executor stops
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartCachedPlan
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartCachedPlan(), which will create a new plan
+tree and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5b989074203..ec2387d7f1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,11 +55,13 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -114,11 +116,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Return value indicates if the plan has been initialized successfully so
+ * that queryDesc->planstate contains a valid PlanState tree. It may not
+ * if the plan got invalidated during InitPlan().
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -130,12 +137,14 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
+ plan_valid = (*ExecutorStart_hook) (queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ return plan_valid;
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
EState *estate;
@@ -259,6 +268,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ return ExecPlanStillValid(queryDesc->estate);
+}
+
+/*
+ * ExecutorStartCachedPlan
+ * Start execution for a given query in the CachedPlanSource, replanning
+ * if the plan is invalidated due to deferred locks taken during the
+ * plan's initialization
+ *
+ * This function handles cases where the CachedPlan given in queryDesc->cplan
+ * might become invalid during the initialization of the plan given in
+ * queryDesc->plannedstmt, particularly when prunable relations in it are
+ * locked after performing initial pruning. If the locks invalidate the plan,
+ * the function calls UpdateCachedPlan() to replan all queries in the
+ * CachedPlan, and then retries initialization.
+ *
+ * The function repeats the process until ExecutorStart() successfully
+ * initializes the plan, that is without the CachedPlan becoming invalid.
+ */
+void
+ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (unlikely(queryDesc->cplan == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
+ if (unlikely(plansource == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlanSource");
+
+ /*
+ * Loop and retry with an updated plan until no further invalidation
+ * occurs.
+ */
+ while (1)
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /*
+ * Clean up the current execution state before creating the new
+ * plan to retry ExecutorStart(). Mark execution as aborted to
+ * ensure that AFTER trigger state is properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ /* Retry ExecutorStart() with an updated plan tree. */
+ queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+ }
+ else
+
+ /*
+ * Exit the loop if the plan is initialized successfully and no
+ * sinval messages were received that invalidated the CachedPlan.
+ */
+ break;
+ }
}
/* ----------------------------------------------------------------
@@ -317,6 +384,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -423,8 +491,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -487,11 +558,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
(PgStat_Counter) estate->es_parallel_workers_launched);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -505,6 +575,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -603,6 +681,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -828,6 +921,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If the plan originates from a CachedPlan (given in queryDesc->cplan),
+ * it can become invalid during runtime "initial" pruning when the
+ * remaining set of locks is taken. The function returns early in that
+ * case without initializing the plan, and the caller is expected to
+ * retry with a new valid plan.
* ----------------------------------------------------------------
*/
static void
@@ -835,6 +934,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -855,6 +955,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -865,9 +966,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* executed, are saved in es_part_prune_results. These results correspond
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
*/
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2868,6 +2975,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 134ff62f5cb..1bedb808368 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1258,8 +1258,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but parallel workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1440,7 +1447,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ if (!ExecutorStart(queryDesc, fpes->eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7ada26a541c..8eb95e92816 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -1768,7 +1769,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * all plan nodes that contain a PartitionPruneInfo.
+ * all plan nodes that contain a PartitionPruneInfo. This also locks the
+ * leaf partitions whose subnodes will be initialized if needed.
*
* ExecInitPartitionExecPruning:
* Updates the PartitionPruneState found at given part_prune_index in
@@ -1789,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * plan nodes that support partition pruning.
+ * plan nodes that support partition pruning. This also locks the leaf
+ * partitions whose subnodes will be initialized if needed.
*
* This function iterates over each PartitionPruneInfo entry in
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
@@ -1816,6 +1820,7 @@ void
ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
@@ -1841,11 +1846,40 @@ ExecDoInitialPruning(EState *estate)
else
validsubplan_rtis = all_leafpart_rtis;
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ locked_relids = lappend_int(locked_relids, rtindex);
+ }
+ }
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
+
+ /*
+ * Release the useless locks if the plan won't be executed. This is the
+ * same as what CheckCachedPlan() in plancache.c does.
+ */
+ if (!ExecPlanStillValid(estate))
+ {
+ foreach(lc, locked_relids)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 1792c40a5cb..67926178759 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 757f8068e21..6aa8e9c4d8a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -864,7 +865,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+ if (!ExecutorStart(es->qd, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index ecb2e4ccaa1..3288396def3 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1685,7 +1686,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2500,6 +2502,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int query_index = 0;
spicallbackarg.query = plansource->query_string;
@@ -2690,14 +2693,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, query_index);
FreeQueryDesc(qdesc);
}
else
@@ -2794,6 +2799,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ query_index++;
}
/* Done with this plan, so release refcount */
@@ -2871,7 +2878,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2927,7 +2935,16 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
ExecutorRun(queryDesc, ForwardScanDirection, tcount);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5655348a2e2..f60f2785bc1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1224,6 +1224,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2025,7 +2026,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6f22496305a..dea24453a6c 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -36,6 +37,9 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +69,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +82,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +128,9 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +143,9 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,14 +157,23 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, 0, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* Run the plan to completion.
@@ -493,6 +514,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -512,9 +534,19 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (portal->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, myeflags,
+ portal->plansource, 0);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, myeflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* This tells PortalCleanup to shut down the executor
@@ -1188,6 +1220,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int query_index = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1269,6 +1302,9 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1278,6 +1314,9 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1342,6 +1381,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ query_index++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 55db8f53705..71839dca108 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,7 +101,8 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ bool release_generic);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -578,10 +579,17 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
+ *
+ * This also releases and drops the generic plan (plansource->gplan), if any,
+ * as most callers will typically build a new CachedPlan for the plansource
+ * right after this. However, when called from UpdateCachedPlan(), the
+ * function does not release the generic plan, as UpdateCachedPlan() updates
+ * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv)
+ QueryEnvironment *queryEnv,
+ bool release_generic)
{
bool snapshot_set;
RawStmt *rawtree;
@@ -678,8 +686,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference if any */
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference, if any, and if requested */
+ if (release_generic)
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -815,8 +824,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. However, the plans are not fully
+ * race-condition-free until the executor acquires locks on the prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -870,7 +882,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
- /* Successfully revalidated and locked the query. */
+ /*
+ * Successfully revalidated and locked the query. Set is_reused
+ * to true so that CachedPlanRequiresLocking() returns true.
+ */
+ plan->is_reused = true;
return true;
}
@@ -895,12 +911,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* To build a generic, parameter-value-independent plan, pass NULL for
* boundParams. To build a custom plan, pass the actual parameter values via
* boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * each parameter value; otherwise the planner will treat the value as a hint
+ * rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -911,6 +929,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
+ MemoryContext stmt_context = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -928,7 +947,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/*
* If we don't already have a copy of the querytree list that can be
@@ -967,10 +986,19 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
PopActiveSnapshot();
/*
- * Normally we make a dedicated memory context for the CachedPlan and its
- * subsidiary data. (It's probably not going to be large, but just in
- * case, allow it to grow large. It's transient for the moment.) But for
- * a one-shot plan, we just leave it in the caller's memory context.
+ * Normally, we create a dedicated memory context for the CachedPlan and
+ * its subsidiary data. Although it's usually not very large, the context
+ * is designed to allow growth if necessary.
+ *
+ * The PlannedStmts are stored in a separate child context (stmt_context)
+ * of the CachedPlan's memory context. This separation allows
+ * UpdateCachedPlan() to free and replace the PlannedStmts without
+ * affecting the CachedPlan structure or its stmt_list List.
+ *
+ * For one-shot plans, we instead use the caller's memory context, as the
+ * CachedPlan will not persist. stmt_context will be set to NULL in this
+ * case, because UpdateCachedPlan() should never get called on a one-shot
+ * plan.
*/
if (!plansource->is_oneshot)
{
@@ -979,12 +1007,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- /*
- * Copy plan into the new context.
- */
- MemoryContextSwitchTo(plan_context);
+ stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan PlannedStmts",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
+ MemoryContextSetParent(stmt_context, plan_context);
+ MemoryContextSwitchTo(stmt_context);
plist = copyObject(plist);
+
+ MemoryContextSwitchTo(plan_context);
+ plist = list_copy(plist);
}
else
plan_context = CurrentMemoryContext;
@@ -1025,8 +1058,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->saved_xmin = InvalidTransactionId;
plan->refcount = 0;
plan->context = plan_context;
+ plan->stmt_context = stmt_context;
plan->is_oneshot = plansource->is_oneshot;
plan->is_saved = false;
+ plan->is_reused = false;
plan->is_valid = true;
/* assign generation number to new plan */
@@ -1153,8 +1188,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid, but not all locks are acquired if the
+ * returned plan is a reused generic plan. In such cases, locks on relations
+ * subject to initial runtime pruning are not taken by CheckCachedPlan() but
+ * deferred until the execution startup phase, specifically when
+ * ExecDoInitialPruning() performs initial pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1180,7 +1218,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1276,6 +1314,113 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * UpdateCachedPlan
+ * Create fresh plans for all queries in the CachedPlanSource, replacing
+ * those in the generic plan's stmt_list, and return the plan for the
+ * query_index'th query.
+ *
+ * This function is primarily used by ExecutorStartCachedPlan() to handle
+ * cases where the original generic CachedPlan becomes invalid. Such
+ * invalidation may occur when prunable relations in the old plan for the
+ * query_index'th query are locked in preparation for execution.
+ *
+ * Note that invalidations received during the execution of the query_index'th
+ * query can affect both the queries that have already finished execution
+ * (e.g., due to concurrent modifications on prunable relations that were not
+ * locked during their execution) and also the queries that have not yet been
+ * executed. As a result, this function updates all plans to ensure
+ * CachedPlan.is_valid is safely set to true.
+ *
+ * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
+ * the caller and any of its callers must not rely on them remaining accessible
+ * after this function is called.
+ */
+PlannedStmt *
+UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ ListCell *l1,
+ *l2;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
+ else if (plan->is_oneshot)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan() returned
+ * the CachedPlan. See the comment in BuildCachedPlan() for details on why
+ * this might happen. Although invalidation is likely a false positive as
+ * stated there, we make the plan valid to ensure the query list used for
+ * planning is up to date.
+ *
+ * The risk of catching an invalidation is higher here than when
+ * BuildCachedPlan() is called from GetCachedPlan(), because this function
+ * is normally called long after GetCachedPlan() returns the CachedPlan,
+ * so much more processing could have occurred including things that mark
+ * the CachedPlanSource invalid.
+ *
+ * Note: Do not release plansource->gplan, because the upstream callers
+ * (such as the callers of ExecutorStartCachedPlan()) would still be
+ * referencing it.
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv, false);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for all the queries after making a copy to be
+ * scribbled on by the planner.
+ */
+ query_list = copyObject(query_list);
+
+ /*
+ * Planning work is done in the caller's memory context. The resulting
+ * PlannedStmt is then copied into plan->stmt_context after throwing away
+ * the old ones.
+ */
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+ Assert(list_length(plan_list) == list_length(plan->stmt_list));
+
+ MemoryContextReset(plan->stmt_context);
+ oldcxt = MemoryContextSwitchTo(plan->stmt_context);
+ forboth(l1, plan_list, l2, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst(l1);
+
+ lfirst(l2) = copyObject(plannedstmt);
+ }
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * XXX Should this also (re)set the properties of the CachedPlan that are
+ * set in BuildCachedPlan() after creating the fresh plans such as
+ * planRoleId, dependsOnRole, and save_xmin?
+ */
+
+ /*
+ * We've updated all the plans that might have been invalidated, so mark
+ * the CachedPlan as valid.
+ */
+ plan->is_valid = true;
+
+ /* Also update generic_cost because we just created a new generic plan. */
+ plansource->generic_cost = cached_plan_cost(plan, false);
+
+ return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1654,7 +1799,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv);
+ RevalidateCachedQuery(plansource, queryEnv, true);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -1776,7 +1921,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,13 +1939,16 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 0be1c2b0fff..e3526e78064 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f4..570e7cad1fa 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,10 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 2ed2c4bb378..4180601dcd4 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..ba53305ad42 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index e9ccc438cdd..c055b4436bc 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -72,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -191,8 +192,11 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -255,6 +259,30 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
+
+/*
+ * Locks are needed only if running a cached plan that might contain unlocked
+ * relations, such as a reused generic plan.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a2cba97e3d5..9519dca374b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -655,6 +656,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
@@ -707,6 +709,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 46072d311b1..2d83f7d4930 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,8 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
+#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -139,10 +141,11 @@ typedef struct CachedPlanSource
* The reference count includes both the link from the parent CachedPlanSource
* (if any), and any active plan executions, so the plan can be discarded
* exactly when refcount goes to zero. Both the struct itself and the
- * subsidiary data live in the context denoted by the context field.
- * This makes it easy to free a no-longer-needed cached plan. (However,
- * if is_oneshot is true, the context does not belong solely to the CachedPlan
- * so no freeing is possible.)
+ * subsidiary data, except the PlannedStmts in stmt_list live in the context
+ * denoted by the context field; the PlannedStmts live in the context denoted
+ * by stmt_context. Separate contexts makes it easy to free a no-longer-needed
+ * cached plan. (However, if is_oneshot is true, the context does not belong
+ * solely to the CachedPlan so no freeing is possible.)
*/
typedef struct CachedPlan
{
@@ -150,6 +153,7 @@ typedef struct CachedPlan
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
+ bool is_reused; /* is it a reused generic plan? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
@@ -158,6 +162,10 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext stmt_context; /* context containing the PlannedStmts in
+ * stmt_list, but not the List itself which is
+ * in the above context; NULL if is_oneshot is
+ * true. */
} CachedPlan;
/*
@@ -223,6 +231,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -235,4 +247,34 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire additional locks?
+ *
+ * If the plan is a saved generic plan, the executor must acquire locks for
+ * relations that are not covered by AcquireExecutorLocks(), such as partitions
+ * that are subject to initial runtime pruning.
+ *
+ * Note: These locks are unnecessary if the plan is executed immediately after
+ * its creation, since the planner would have already acquired them. However,
+ * we do not optimize for that case.
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return !cplan->is_oneshot && cplan->is_reused;
+}
+
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 0b62143af8b..ddee031f551 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -240,7 +241,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846da..3eeb097fde4 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7bc97f84a1c..844af6bd061 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2025, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/fmgrprotos.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 00000000000..5bfb2b33b39
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,282 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(56 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(41 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Index Scan using foo12_1_a on foo12_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(9 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Disabled: true
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(10 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index b53488f76d2..58159bfc574 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 00000000000..f27e8fb521c
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,80 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE TABLE bar (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE bar1 PARTITION OF bar FOR VALUES IN (1);
+ CREATE INDEX ON bar1(a);
+ CREATE TABLE bar2 PARTITION OF bar FOR VALUES IN (2);
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO UPDATE bar SET a = a WHERE a = one();
+ CREATE RULE update_bar AS ON UPDATE TO bar DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo, bar;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Another case with Append with run-time pruning in a subquery
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-06 02:35 ` Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-14 21:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
0 siblings, 2 replies; 66+ messages in thread
From: Amit Langote @ 2025-02-06 02:35 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Fri, Jan 31, 2025 at 5:31 PM Amit Langote <[email protected]> wrote:
> On Thu, Jan 23, 2025 at 4:15 PM Amit Langote <[email protected]> wrote:
> > I’ve rebased over recent changes to setrefs.c (commit bf826ea0629).
> > During the rebase, I realized that the patch
> > 0002-Initialize-PartitionPruneContexts-lazily wasn’t a good idea after
> > all.
> >
> > The test case added by bf826ea0629 highlighted an issue: initializing
> > pruning expressions lazily during execution could leave the
> > Append/MergeAppend node’s PlanState.subPlan uninitialized at
> > ExecInitNode() time. Initially, I thought this would have only
> > cosmetic consequences -- such as changes in test case output where
> > SubPlans referenced in "exec" pruning expressions wouldn’t appear --
> > but I may have underestimated the problem. As a result, I’ve abandoned
> > that approach and the patch in favor of initializing all pruning
> > expressions during plan initialization.
> >
> > Additionally, I revisited the impact of the main patch on
> > ExecutorStart_hooks. It seems better to change the return type from
> > void to bool, returning the result of
> > ExecPlanStillValid(queryDesc->estate). This change has the added
> > benefit of breaking extensions that use ExecutorStart_hook at compile
> > time, encouraging authors to update their code. The updated commit
> > message includes details on additional checks extensions must
> > implement, particularly for cases where they might access pruned and
> > thus unlocked relations.
> >
> > I've stared at the refactoring patches 0001 and 0002 for long enough
> > at this point that I'd like to commit them early next week, barring
> > further comments or objections. I'll keep staring at 0003.
>
> I have now pushed 0001 and 0002.
>
> I broke 0003 into two patches:
>
> Patch to track unpruned relations in the executor, allowing the
> overhead of processing pruned partitions to be skipped during plan
> initialization. This is particularly relevant for top-level nodes such
> as ModifyTable and LockRows, which -- unlike Append / MergeAppend --
> do not ignore initially pruned partitions. Since initial pruning is
> now performed separately from plan initialization and earlier in
> InitPlan(), we can fix this by checking whether a given child result
> relation or RowMark belongs to a pruned partition and skipping it.
>
> Patch to defer locking of prunable relations from GetCachedPlan() to
> InitPlan(), preventing partitions pruned by initial pruning from being
> locked unnecessarily.
>
> With the attached 0001, I can see that saving the overhead of
> initializing ResultRelInfos for pruned partitions in
> ExecInitModifyTable() results in a noticeable speedup for pgbench
> -Mprepared with partitions, especially at higher partition counts
> where the overhead is more significant. The numbers I have here are a
> bit noisy, but they provide a general idea of the performance benefit
> of skipping initially pruned partitions during plan initialization.
>
> Setup:
>
> plan_cache_mode = force_generic_plan
> max_locks_per_transaction = 1000
>
> for i in 100 200 500 1000 2000; do
> echo -ne "$i\t"
> pgbench -i --partitions=$i > /dev/null 2>&1;
> pgbench -n -Mprepared -T 10 | grep tps;
> done
>
> With master:
> 100 tps = 2837.095192 (without initial connection time)
> 200 tps = 2614.143255 (without initial connection time)
> 500 tps = 1960.666074 (without initial connection time)
> 1000 tps = 1390.691229 (without initial connection time)
> 2000 tps = 884.882656 (without initial connection time)
>
>
> With 0001:
> 100 tps = 2889.600827 (without initial connection time)
> 200 tps = 2720.895632 (without initial connection time)
> 500 tps = 2096.177756 (without initial connection time)
> 1000 tps = 1659.265873 (without initial connection time)
> 2000 tps = 1148.976177 (without initial connection time)
>
> With 0002:
> 100 tps = 3070.137629 (without initial connection time)
> 200 tps = 4589.336857 (without initial connection time)
> 500 tps = 2977.339119 (without initial connection time)
> 1000 tps = 2885.417560 (without initial connection time)
> 2000 tps = 3832.111167 (without initial connection time)
Per cfbot-ci, the new test case output in 0002 needed to be updated.
I plan to push 0001 tomorrow, barring any objections.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v62-0001-Track-unpruned-relids-to-avoid-processing-pruned.patch (40.2K, 2-v62-0001-Track-unpruned-relids-to-avoid-processing-pruned.patch)
download | inline diff:
From e16aa3108a447cd7d9fe8ea7a6e888b32348495d Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 5 Feb 2025 17:27:36 +0900
Subject: [PATCH v62 1/2] Track unpruned relids to avoid processing pruned
relations
This commit introduces changes to track unpruned relations explicitly,
making it possible for top-level plan nodes, such as ModifyTable and
LockRows, to avoid processing partitions pruned during initial
pruning. Scan-level nodes, such as Append and MergeAppend, already
avoid the unnecessary processing by accessing partition pruning
results directly via part_prune_index. In contrast, top-level nodes
cannot access pruning results directly and need to determine which
partitions remain unpruned.
To address this, the executor introduces a new bitmapset field,
es_unpruned_relids, in EState, which tracks the set of unpruned
relations at plan initialization. This field is referenced during plan
initialization to skip initializing certain nodes for pruned
partitions. It is initialized with PlannedStmt.unprunableRelids, a
new field that the planner populates with RT indexes of relations that
cannot be pruned during runtime pruning. These include relations not
subject to partition pruning and those required for execution
regardless of pruning.
PlannedStmt.unprunableRelids is computed during set_plan_refs() by
removing the RT indexes of runtime-prunable relations, identified
from PartitionPruneInfos, from the full set of relation RT indexes.
ExecDoInitialPruning() then updates es_unpruned_relids by adding
partitions that survive initial pruning.
To support this, PartitionedRelPruneInfo and PartitionedRelPruningData
now include a leafpart_rti_map[] array that maps partition indexes to
their corresponding RT indexes. The former is used in set_plan_refs()
when constructing unprunableRelids, while the latter is used in
ExecDoInitialPruning() to convert partition indexes returned by
get_matching_partitions() into RT indexes, which are then added to
es_unpruned_relids.
These changes make it possible for ModifyTable and LockRows nodes to
process only relations that remain unpruned after initial pruning.
ExecInitModifyTable() trims lists, such as resultRelations,
withCheckOptionLists, returningLists, and updateColnosLists, to
consider only unpruned partitions. It also creates ResultRelInfo
structs only for these partitions. Similarly, child RowMarks for
pruned relations are skipped.
By avoiding unnecessary initialization of structures for pruned
partitions, these changes improve the performance of updates and
deletes on partitioned tables during initial runtime pruning.
Due to ExecInitModifyTable() changes as described above, EXPLAIN on a
plan for UPDATE and DELETE that uses runtime initial pruning no longer
lists partitions pruned during initial pruning.
Reviewed-by: Robert Haas <[email protected]> (earlier versions)
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/commands/copyfrom.c | 3 +-
src/backend/executor/execMain.c | 19 ++++-
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 83 ++++++++++++++++---
src/backend/executor/execUtils.c | 12 ++-
src/backend/executor/nodeAppend.c | 8 +-
src/backend/executor/nodeLockRows.c | 9 +-
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/nodeModifyTable.c | 70 +++++++++++++---
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 29 ++++++-
src/backend/partitioning/partprune.c | 15 ++++
src/backend/replication/logical/worker.c | 3 +-
src/backend/replication/pgoutput/pgoutput.c | 3 +-
src/include/executor/execPartition.h | 6 +-
src/include/executor/executor.h | 3 +-
src/include/nodes/execnodes.h | 10 +++
src/include/nodes/pathnodes.h | 8 ++
src/include/nodes/plannodes.h | 7 ++
src/test/regress/expected/partition_prune.out | 46 ++++++++++
src/test/regress/sql/partition_prune.sql | 20 +++++
21 files changed, 320 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f5602..da1e8ddc5a1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -768,7 +768,8 @@ CopyFrom(CopyFromState cstate)
* index-entry-making machinery. (There used to be a huge amount of code
* here that basically duplicated execUtils.c ...)
*/
- ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos);
+ ExecInitRangeTable(estate, cstate->range_table, cstate->rteperminfos,
+ bms_make_singleton(1));
resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, resultRelInfo, 1);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 604cb0625b8..5b989074203 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,7 +851,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
/*
* initialize the node's execution state
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
+ bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
@@ -881,8 +882,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Relation relation;
ExecRowMark *erm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* get relation's OID (will produce InvalidOid if subquery) */
@@ -2933,6 +2939,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
}
}
+ /*
+ * Copy es_unpruned_relids so that RowMarks of pruned relations are
+ * ignored in ExecInitLockRows() and ExecInitModifyTable() when
+ * initializing the plan trees below.
+ */
+ rcestate->es_unpruned_relids = parentestate->es_unpruned_relids;
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9c313d81315..134ff62f5cb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
+ pstmt->unprunableRelids = estate->es_unpruned_relids;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 57245349cec..b6e89d0620d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,7 +182,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -196,7 +197,8 @@ static void InitExecPartitionPruneContexts(PartitionPruneState *prunstate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis);
/*
@@ -1820,9 +1822,12 @@ ExecDoInitialPruning(EState *estate)
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
+ Bitmapset *all_leafpart_rtis = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo,
+ &all_leafpart_rtis);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
@@ -1831,7 +1836,13 @@ ExecDoInitialPruning(EState *estate)
* bitmapset or NULL as described in the header comment.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ else
+ validsubplan_rtis = all_leafpart_rtis;
+
+ estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
+ validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
@@ -1944,9 +1955,16 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* initialized here. Those required for exec pruning are initialized later in
* ExecInitPartitionExecPruning(), as they depend on the availability of the
* parent plan node's PlanState.
+ *
+ * If initial pruning steps are to be skipped (e.g., during EXPLAIN
+ * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
+ * all leaf partitions whose scanning subnode is included in the parent plan
+ * node's list of child plans. The caller must add these RT indexes to
+ * estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
+ Bitmapset **all_leafpart_rtis)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2039,8 +2057,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* The set of partitions that exist now might not be the same that
* existed when the plan was made. The normal case is that it is;
* optimize for that case with a quick comparison, and just copy
- * the subplan_map and make subpart_map point to the one in
- * PruneInfo.
+ * the subplan_map and make subpart_map, leafpart_rti_map point to
+ * the ones in PruneInfo.
*
* For the case where they aren't identical, we could have more
* partitions on either side; or even exactly the same number of
@@ -2059,6 +2077,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
sizeof(int) * partdesc->nparts) == 0)
{
pprune->subpart_map = pinfo->subpart_map;
+ pprune->leafpart_rti_map = pinfo->leafpart_rti_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
}
@@ -2079,6 +2098,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* mismatches.
*/
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->leafpart_rti_map = palloc(sizeof(int) * partdesc->nparts);
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
@@ -2096,6 +2116,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->leafpart_rti_map[pp_idx] =
+ pinfo->leafpart_rti_map[pd_idx];
pd_idx++;
continue;
}
@@ -2133,6 +2155,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map[pp_idx] = -1;
pprune->subplan_map[pp_idx] = -1;
+ pprune->leafpart_rti_map[pp_idx] = 0;
}
}
@@ -2174,6 +2197,25 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
prunestate->execparamids = bms_add_members(prunestate->execparamids,
pinfo->execparamids);
+ /*
+ * Return all leaf partition indexes if we're skipping pruning in
+ * the EXPLAIN (GENERIC_PLAN) case.
+ */
+ if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
+ {
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
+ rtindex);
+ }
+ }
+
j++;
}
i++;
@@ -2439,10 +2481,15 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * The caller must pass a non-NULL validsubplan_rtis during initial pruning
+ * to collect the RT indexes of leaf partitions whose subnodes will be
+ * executed. These RT indexes are later added to EState.es_unpruned_relids.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2454,6 +2501,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* evaluated *and* there are steps in which to do so.
*/
Assert(initial_prune || prunestate->do_exec_prune);
+ Assert(validsubplan_rtis != NULL || !initial_prune);
/*
* Switch to a temp context to avoid leaking memory in the executor's
@@ -2477,7 +2525,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, validsubplan_rtis);
/*
* Expression eval may have used space in ExprContext too. Avoid
@@ -2495,6 +2543,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_copy(*validsubplan_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2505,13 +2555,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
+ * of their corresponding leaf partitions to *validsubplan_rtis if
+ * it's non-NULL.
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *partset;
int i;
@@ -2538,8 +2591,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_add_member(*validsubplan_rtis,
+ pprune->leafpart_rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2547,7 +2605,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ validsubplan_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 00564985668..c9c756f8568 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -771,7 +771,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
* indexed by rangetable index.
*/
void
-ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
+ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids)
{
/* Remember the range table List as-is */
estate->es_range_table = rangeTable;
@@ -782,6 +783,15 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
/* Set size of associated arrays */
estate->es_range_table_size = list_length(rangeTable);
+ /*
+ * Initialize the bitmapset of RT indexes (es_unpruned_relids)
+ * representing relations that will be scanned during execution. This set
+ * is initially populated by the caller and may be extended later by
+ * ExecDoInitialPruning() to include RT indexes of unpruned leaf
+ * partitions.
+ */
+ estate->es_unpruned_relids = unpruned_relids;
+
/*
* Allocate an array to store an open Relation corresponding to each
* rangetable entry, and initialize entries to NULL. Relations are opened
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 2397e5e17b0..15c4227cc62 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -595,7 +595,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -662,7 +662,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -738,7 +738,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -891,7 +891,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 4e4e3db0b38..a8afbf93b48 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -347,8 +347,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at runtime.
+ * Also ignore the rowmarks belonging to child tables that have been
+ * pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index b2dc6626c99..405e8f94285 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -233,7 +233,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bc82e035ba2..349ed2d6d2c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -690,7 +690,7 @@ ExecInitUpdateProjection(ModifyTableState *mtstate,
Assert(whichrel >= 0 && whichrel < mtstate->mt_nrels);
}
- updateColnos = (List *) list_nth(node->updateColnosLists, whichrel);
+ updateColnos = (List *) list_nth(mtstate->mt_updateColnosLists, whichrel);
/*
* For UPDATE, we use the old tuple to fill up missing values in the tuple
@@ -4453,7 +4453,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ModifyTableState *mtstate;
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
- int nrels = list_length(node->resultRelations);
+ int nrels;
+ List *resultRelations = NIL;
+ List *withCheckOptionLists = NIL;
+ List *returningLists = NIL;
+ List *updateColnosLists = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4463,6 +4467,45 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+ /*
+ * Only consider unpruned relations for initializing their ResultRelInfo
+ * struct and other fields such as withCheckOptions, etc.
+ */
+ i = 0;
+ foreach(l, node->resultRelations)
+ {
+ Index rti = lfirst_int(l);
+
+ if (bms_is_member(rti, estate->es_unpruned_relids))
+ {
+ resultRelations = lappend_int(resultRelations, rti);
+ if (node->withCheckOptionLists)
+ {
+ List *withCheckOptions = list_nth_node(List,
+ node->withCheckOptionLists,
+ i);
+
+ withCheckOptionLists = lappend(withCheckOptionLists, withCheckOptions);
+ }
+ if (node->returningLists)
+ {
+ List *returningList = list_nth_node(List,
+ node->returningLists,
+ i);
+
+ returningLists = lappend(returningLists, returningList);
+ }
+ if (node->updateColnosLists)
+ {
+ List *updateColnosList = list_nth(node->updateColnosLists, i);
+
+ updateColnosLists = lappend(updateColnosLists, updateColnosList);
+ }
+ }
+ i++;
+ }
+ nrels = list_length(resultRelations);
+
/*
* create state structure
*/
@@ -4483,6 +4526,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->mt_merge_inserted = 0;
mtstate->mt_merge_updated = 0;
mtstate->mt_merge_deleted = 0;
+ mtstate->mt_updateColnosLists = updateColnosLists;
/*----------
* Resolve the target relation. This is the same as:
@@ -4500,6 +4544,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
if (node->rootRelation > 0)
{
+ Assert(bms_is_member(node->rootRelation, estate->es_unpruned_relids));
mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
node->rootRelation);
@@ -4514,7 +4559,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
- node->epqParam, node->resultRelations);
+ node->epqParam, resultRelations);
mtstate->fireBSTriggers = true;
/*
@@ -4532,7 +4577,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
resultRelInfo = mtstate->resultRelInfo;
i = 0;
- foreach(l, node->resultRelations)
+ foreach(l, resultRelations)
{
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
@@ -4676,7 +4721,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize any WITH CHECK OPTION constraints if needed.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->withCheckOptionLists)
+ foreach(l, withCheckOptionLists)
{
List *wcoList = (List *) lfirst(l);
List *wcoExprs = NIL;
@@ -4699,7 +4744,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/*
* Initialize RETURNING projections if needed.
*/
- if (node->returningLists)
+ if (returningLists)
{
TupleTableSlot *slot;
ExprContext *econtext;
@@ -4708,7 +4753,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize result tuple slot and assign its rowtype using the first
* RETURNING list. We assume the rest will look the same.
*/
- mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
+ mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
/* Set up a slot for the output of the RETURNING projection(s) */
ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
@@ -4723,7 +4768,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->returningLists)
+ foreach(l, returningLists)
{
List *rlist = (List *) lfirst(l);
@@ -4824,8 +4869,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at runtime.
+ * Also ignore the rowmarks belonging to child tables that have been
+ * pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* Find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ffd7517ea97..7b1a8a0a9f1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -557,6 +557,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(glob->allRelids,
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 0868249be94..999a5a8ab5a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -564,7 +564,9 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
/*
* If it's a plain relation RTE (or a subquery that was once a view
- * reference), add the relation OID to relationOids.
+ * reference), add the relation OID to relationOids. Also add its new RT
+ * index to the set of relations to be potentially accessed during
+ * execution.
*
* We do this even though the RTE might be unreferenced in the plan tree;
* this would correspond to cases such as views that were expanded, child
@@ -576,7 +578,11 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
*/
if (newrte->rtekind == RTE_RELATION ||
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
+ {
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ glob->allRelids = bms_add_member(glob->allRelids,
+ list_length(glob->finalrtable));
+ }
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
@@ -1740,6 +1746,10 @@ set_customscan_references(PlannerInfo *root,
*
* Also update the RT indexes present in PartitionedRelPruneInfos to add the
* offset.
+ *
+ * Finally, if there are initial pruning steps, add the RT indexes of the
+ * leaf partitions to the set of relations that are prunable at execution
+ * startup time.
*/
static int
register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
@@ -1762,6 +1772,7 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ int i;
prelinfo->rtindex += rtoffset;
prelinfo->initial_pruning_steps =
@@ -1770,6 +1781,22 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
prelinfo->exec_pruning_steps =
fix_scan_list(root, prelinfo->exec_pruning_steps,
rtoffset, 1);
+
+ for (i = 0; i < prelinfo->nparts; i++)
+ {
+ /*
+ * Non-leaf partitions and partitions that do not have a
+ * subplan are not included in this map as mentioned in
+ * make_partitionedrel_pruneinfo().
+ */
+ if (prelinfo->leafpart_rti_map[i])
+ {
+ prelinfo->leafpart_rti_map[i] += rtoffset;
+ if (prelinfo->initial_pruning_steps)
+ glob->prunableRelids = bms_add_member(glob->prunableRelids,
+ prelinfo->leafpart_rti_map[i]);
+ }
+ }
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 4693eef0c58..ff926732f36 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -645,6 +645,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ int *leafpart_rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -657,6 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ leafpart_rti_map = (int *) palloc0(nparts * sizeof(int));
present_parts = NULL;
i = -1;
@@ -671,9 +673,21 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of "leaf" partitions so they can be
+ * included in the PlannerGlobal.prunableRelids set, indicating
+ * relations that may be pruned during executor startup.
+ *
+ * Only leaf partitions with a valid subplan that are prunable
+ * using initial pruning are added to prunableRelids. So
+ * partitions without a subplan due to constraint exclusion will
+ * remain in PlannedStmt.unprunableRelids.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ leafpart_rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -695,6 +709,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->leafpart_rti_map = leafpart_rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 6966037d2ef..f09ab41c605 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -668,7 +668,8 @@ create_edata_for_relation(LogicalRepRelMapEntry *rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 0227fcbca3d..2f89996a757 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -820,7 +820,8 @@ create_estate_for_relation(Relation rel)
addRTEPermissionInfo(&perminfos, rte);
- ExecInitRangeTable(estate, list_make1(rte), perminfos);
+ ExecInitRangeTable(estate, list_make1(rte), perminfos,
+ bms_make_singleton(1));
estate->es_output_cid = GetCurrentCommandId(false);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 855fed4fea5..951009cf46c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -48,6 +48,8 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * leafpart_rti_map RT index by partition index, or 0 if not a leaf
+ * partition.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -65,6 +67,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ int *leafpart_rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -135,6 +138,7 @@ extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 45b80e6b98e..30e2a82346f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -595,7 +595,8 @@ extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
-extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
+extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
+ Bitmapset *unpruned_relids);
extern void ExecCloseRangeTableRelations(EState *estate);
extern void ExecCloseResultRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index aca15f771a2..a2cba97e3d5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -658,6 +658,10 @@ typedef struct EState
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
+ Bitmapset *es_unpruned_relids; /* PlannedStmt.unprunableRelids + RT
+ * indexes of leaf partitions that survive
+ * initial pruning; see
+ * ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -1440,6 +1444,12 @@ typedef struct ModifyTableState
double mt_merge_inserted;
double mt_merge_updated;
double mt_merge_deleted;
+
+ /*
+ * List of valid updateColnosLists. Contains only those belonging to
+ * unpruned relations from ModifyTable.updateColnosLists.
+ */
+ List *mt_updateColnosLists;
} ModifyTableState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 52d44f43021..2fe5179ca77 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,14 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of all relation RTEs in finalrtable (RTE_RELATION and
+ * RTE_SUBQUERY RTEs of views) and of those that are subject to runtime
+ * pruning at plan initialization time ("initial" pruning).
+ */
+ Bitmapset *allRelids;
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 06d9559ebb9..4abefa7bec0 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; set for
+ * AcquireExecutorLocks(). */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1483,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* subpart index by partition index, or -1 */
int *subpart_map pg_node_attr(array_size(nparts));
+ /* RT index by partition index, or 0 if not a leaf partition */
+ int *leafpart_rti_map pg_node_attr(array_size(nparts));
+
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index f0707e7f7ea..e667503c961 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4469,3 +4469,49 @@ drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+-- Only the unpruned partition should be shown in the list of relations to be
+-- updated
+explain (costs off) execute update_part_abc_view (1, 'd');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (1, 'd');
+ a | b | c
+---+---+---
+ 1 | d | t
+(1 row)
+
+explain (costs off) execute update_part_abc_view (2, 'a');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_2 part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_2 part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (2, 'a');
+ERROR: new row violates check option for view "part_abc_view"
+DETAIL: Failing row contains (2, a, t).
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ea9a4fe4a23..730545e86a7 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1354,3 +1354,23 @@ drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+-- Only the unpruned partition should be shown in the list of relations to be
+-- updated
+explain (costs off) execute update_part_abc_view (1, 'd');
+execute update_part_abc_view (1, 'd');
+explain (costs off) execute update_part_abc_view (2, 'a');
+execute update_part_abc_view (2, 'a');
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
--
2.43.0
[application/octet-stream] v62-0002-Don-t-lock-partitions-pruned-by-initial-pruning.patch (88.4K, 3-v62-0002-Don-t-lock-partitions-pruned-by-initial-pruning.patch)
download | inline diff:
From 40cf733e603923cc32b375fde5a1303d2e6a4fa0 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 5 Feb 2025 17:38:11 +0900
Subject: [PATCH v62 2/2] Don't lock partitions pruned by initial pruning
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead. This commit defers locking for such
relations and ensures that any invalidation caused by this deferral
triggers replanning when needed.
AcquireExecutorLocks() now locks only unprunable relations to avoid
locking runtime-prunable partitions unnecessarily. This deferral of
locks ensures that runtime-prunable relations are handled later during
executor startup, minimizing overhead and reducing contention in
workloads involving partitioned tables.
This results in significant speedups for generic plans with many
runtime-prunable partitions.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks are properly locked.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and
handle invalidation caused by deferred locking. If invalidation
occurs, ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the refreshed
plan.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A
new CachedPlan.stmt_context, as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and statements list.
ExecutorStart() and ExecutorStart_hook now return a boolean value
indicating whether plan initialization succeeded with a valid
PlanState tree in QueryDesc.planstate.
* Testing:
The delay_execution module tests scenarios where cached plans become
invalid due to changes in prunable relations after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(). For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions that access child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure that their RT indexes
are present in es_unpruned_relids, as failing to do so will result in
an error. This is important because, after this change, only relations
in that set are locked.
Reviewed-by: Robert Haas (earlier versions)
Reviewed-by: David Rowley (earlier versions)
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 16 +-
.../pg_stat_statements/pg_stat_statements.c | 16 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 5 +-
src/backend/commands/explain.c | 22 +-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 5 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/commands/trigger.c | 15 +
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 130 ++++++++-
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 38 ++-
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 4 +-
src/backend/executor/spi.c | 29 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 51 +++-
src/backend/utils/cache/plancache.c | 204 +++++++++++--
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 6 +-
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 34 ++-
src/include/nodes/execnodes.h | 3 +
src/include/utils/plancache.h | 50 +++-
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 ++++-
.../expected/cached-plan-inval.out | 270 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 80 ++++++
33 files changed, 1039 insertions(+), 99 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f1ad876e821..82c17c0a28a 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -76,7 +76,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -256,9 +256,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -294,9 +296,13 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
if (auto_explain_enabled())
{
@@ -314,6 +320,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb0..b735381cb0b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -332,7 +332,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -986,13 +986,19 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1015,6 +1021,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..091fbc12cc5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -556,7 +556,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -566,7 +566,8 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- ExecutorStart(cstate->queryDesc, 0);
+ if (!ExecutorStart(cstate->queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 23cecd99c9e..44b4665ccd3 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -332,12 +332,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ if (!ExecutorStart(queryDesc, GetIntoRelEFlags(into)))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e1..af25c16d215 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -519,7 +519,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -641,7 +642,9 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -697,7 +700,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -711,8 +714,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Prepare the plan for execution. */
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index ba540e3de5b..1b28d20412e 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -907,11 +907,13 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ if (!ExecutorStart(qdesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
ExecutorRun(qdesc, ForwardScanDirection, 0);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c12817091ed..0bfbc5ca6dc 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,12 +438,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index e7c8171c102..4c2ac045224 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -117,6 +117,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 8989c0c882d..c025b1f9f8c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -582,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int query_index = 0;
if (es->memory)
{
@@ -654,7 +656,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -665,6 +668,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ query_index++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7a5ffe32f60..f5f63c89a80 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5048,6 +5048,21 @@ AfterTriggerBeginQuery(void)
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by standard_ExecutorEnd() if the query execution was aborted due to
+ * the plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
+
/* ----------
* AfterTriggerEndQuery()
*
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be613..449c6068ae9 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+flag. If the plan tree is outdated (is_valid = false), the executor stops
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartCachedPlan
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartCachedPlan(), which will create a new plan
+tree and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5b989074203..ec2387d7f1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,11 +55,13 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -114,11 +116,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Return value indicates if the plan has been initialized successfully so
+ * that queryDesc->planstate contains a valid PlanState tree. It may not
+ * if the plan got invalidated during InitPlan().
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -130,12 +137,14 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
+ plan_valid = (*ExecutorStart_hook) (queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ return plan_valid;
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
EState *estate;
@@ -259,6 +268,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ return ExecPlanStillValid(queryDesc->estate);
+}
+
+/*
+ * ExecutorStartCachedPlan
+ * Start execution for a given query in the CachedPlanSource, replanning
+ * if the plan is invalidated due to deferred locks taken during the
+ * plan's initialization
+ *
+ * This function handles cases where the CachedPlan given in queryDesc->cplan
+ * might become invalid during the initialization of the plan given in
+ * queryDesc->plannedstmt, particularly when prunable relations in it are
+ * locked after performing initial pruning. If the locks invalidate the plan,
+ * the function calls UpdateCachedPlan() to replan all queries in the
+ * CachedPlan, and then retries initialization.
+ *
+ * The function repeats the process until ExecutorStart() successfully
+ * initializes the plan, that is without the CachedPlan becoming invalid.
+ */
+void
+ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (unlikely(queryDesc->cplan == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
+ if (unlikely(plansource == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlanSource");
+
+ /*
+ * Loop and retry with an updated plan until no further invalidation
+ * occurs.
+ */
+ while (1)
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /*
+ * Clean up the current execution state before creating the new
+ * plan to retry ExecutorStart(). Mark execution as aborted to
+ * ensure that AFTER trigger state is properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ /* Retry ExecutorStart() with an updated plan tree. */
+ queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+ }
+ else
+
+ /*
+ * Exit the loop if the plan is initialized successfully and no
+ * sinval messages were received that invalidated the CachedPlan.
+ */
+ break;
+ }
}
/* ----------------------------------------------------------------
@@ -317,6 +384,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -423,8 +491,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -487,11 +558,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
(PgStat_Counter) estate->es_parallel_workers_launched);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -505,6 +575,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -603,6 +681,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -828,6 +921,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If the plan originates from a CachedPlan (given in queryDesc->cplan),
+ * it can become invalid during runtime "initial" pruning when the
+ * remaining set of locks is taken. The function returns early in that
+ * case without initializing the plan, and the caller is expected to
+ * retry with a new valid plan.
* ----------------------------------------------------------------
*/
static void
@@ -835,6 +934,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -855,6 +955,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -865,9 +966,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* executed, are saved in es_part_prune_results. These results correspond
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
*/
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2868,6 +2975,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 134ff62f5cb..1bedb808368 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1258,8 +1258,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but parallel workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1440,7 +1447,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ if (!ExecutorStart(queryDesc, fpes->eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b6e89d0620d..432eeaf9034 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -1768,7 +1769,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * all plan nodes that contain a PartitionPruneInfo.
+ * all plan nodes that contain a PartitionPruneInfo. This also locks the
+ * leaf partitions whose subnodes will be initialized if needed.
*
* ExecInitPartitionExecPruning:
* Updates the PartitionPruneState found at given part_prune_index in
@@ -1789,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * plan nodes that support partition pruning.
+ * plan nodes that support partition pruning. This also locks the leaf
+ * partitions whose subnodes will be initialized if needed.
*
* This function iterates over each PartitionPruneInfo entry in
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
@@ -1816,6 +1820,7 @@ void
ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
@@ -1841,11 +1846,40 @@ ExecDoInitialPruning(EState *estate)
else
validsubplan_rtis = all_leafpart_rtis;
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ locked_relids = lappend_int(locked_relids, rtindex);
+ }
+ }
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
+
+ /*
+ * Release the useless locks if the plan won't be executed. This is the
+ * same as what CheckCachedPlan() in plancache.c does.
+ */
+ if (!ExecPlanStillValid(estate))
+ {
+ foreach(lc, locked_relids)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c9c756f8568..fa55b4c6542 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -813,6 +814,10 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed in ExecEndPlan().
+ *
+ * Note: The caller must ensure that 'rti' refers to an unpruned relation
+ * (i.e., it is a member of estate->es_unpruned_relids) before calling this
+ * function. Attempting to open a pruned relation will result in an error.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -821,6 +826,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ if (!bms_is_member(rti, estate->es_unpruned_relids))
+ elog(ERROR, "trying to open a pruned relation");
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 757f8068e21..6aa8e9c4d8a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -864,7 +865,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+ if (!ExecutorStart(es->qd, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index ecb2e4ccaa1..3288396def3 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1685,7 +1686,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2500,6 +2502,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int query_index = 0;
spicallbackarg.query = plansource->query_string;
@@ -2690,14 +2693,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, query_index);
FreeQueryDesc(qdesc);
}
else
@@ -2794,6 +2799,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ query_index++;
}
/* Done with this plan, so release refcount */
@@ -2871,7 +2878,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2927,7 +2935,16 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
ExecutorRun(queryDesc, ForwardScanDirection, tcount);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5655348a2e2..f60f2785bc1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1224,6 +1224,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2025,7 +2026,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6f22496305a..dea24453a6c 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -36,6 +37,9 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +69,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +82,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +128,9 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +143,9 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,14 +157,23 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, 0, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* Run the plan to completion.
@@ -493,6 +514,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -512,9 +534,19 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (portal->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, myeflags,
+ portal->plansource, 0);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, myeflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* This tells PortalCleanup to shut down the executor
@@ -1188,6 +1220,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int query_index = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1269,6 +1302,9 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1278,6 +1314,9 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1342,6 +1381,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ query_index++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 55db8f53705..71839dca108 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,7 +101,8 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ bool release_generic);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -578,10 +579,17 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
+ *
+ * This also releases and drops the generic plan (plansource->gplan), if any,
+ * as most callers will typically build a new CachedPlan for the plansource
+ * right after this. However, when called from UpdateCachedPlan(), the
+ * function does not release the generic plan, as UpdateCachedPlan() updates
+ * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv)
+ QueryEnvironment *queryEnv,
+ bool release_generic)
{
bool snapshot_set;
RawStmt *rawtree;
@@ -678,8 +686,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference if any */
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference, if any, and if requested */
+ if (release_generic)
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -815,8 +824,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. However, the plans are not fully
+ * race-condition-free until the executor acquires locks on the prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -870,7 +882,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
- /* Successfully revalidated and locked the query. */
+ /*
+ * Successfully revalidated and locked the query. Set is_reused
+ * to true so that CachedPlanRequiresLocking() returns true.
+ */
+ plan->is_reused = true;
return true;
}
@@ -895,12 +911,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* To build a generic, parameter-value-independent plan, pass NULL for
* boundParams. To build a custom plan, pass the actual parameter values via
* boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * each parameter value; otherwise the planner will treat the value as a hint
+ * rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -911,6 +929,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
+ MemoryContext stmt_context = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -928,7 +947,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/*
* If we don't already have a copy of the querytree list that can be
@@ -967,10 +986,19 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
PopActiveSnapshot();
/*
- * Normally we make a dedicated memory context for the CachedPlan and its
- * subsidiary data. (It's probably not going to be large, but just in
- * case, allow it to grow large. It's transient for the moment.) But for
- * a one-shot plan, we just leave it in the caller's memory context.
+ * Normally, we create a dedicated memory context for the CachedPlan and
+ * its subsidiary data. Although it's usually not very large, the context
+ * is designed to allow growth if necessary.
+ *
+ * The PlannedStmts are stored in a separate child context (stmt_context)
+ * of the CachedPlan's memory context. This separation allows
+ * UpdateCachedPlan() to free and replace the PlannedStmts without
+ * affecting the CachedPlan structure or its stmt_list List.
+ *
+ * For one-shot plans, we instead use the caller's memory context, as the
+ * CachedPlan will not persist. stmt_context will be set to NULL in this
+ * case, because UpdateCachedPlan() should never get called on a one-shot
+ * plan.
*/
if (!plansource->is_oneshot)
{
@@ -979,12 +1007,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- /*
- * Copy plan into the new context.
- */
- MemoryContextSwitchTo(plan_context);
+ stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan PlannedStmts",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
+ MemoryContextSetParent(stmt_context, plan_context);
+ MemoryContextSwitchTo(stmt_context);
plist = copyObject(plist);
+
+ MemoryContextSwitchTo(plan_context);
+ plist = list_copy(plist);
}
else
plan_context = CurrentMemoryContext;
@@ -1025,8 +1058,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->saved_xmin = InvalidTransactionId;
plan->refcount = 0;
plan->context = plan_context;
+ plan->stmt_context = stmt_context;
plan->is_oneshot = plansource->is_oneshot;
plan->is_saved = false;
+ plan->is_reused = false;
plan->is_valid = true;
/* assign generation number to new plan */
@@ -1153,8 +1188,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid, but not all locks are acquired if the
+ * returned plan is a reused generic plan. In such cases, locks on relations
+ * subject to initial runtime pruning are not taken by CheckCachedPlan() but
+ * deferred until the execution startup phase, specifically when
+ * ExecDoInitialPruning() performs initial pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1180,7 +1218,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1276,6 +1314,113 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * UpdateCachedPlan
+ * Create fresh plans for all queries in the CachedPlanSource, replacing
+ * those in the generic plan's stmt_list, and return the plan for the
+ * query_index'th query.
+ *
+ * This function is primarily used by ExecutorStartCachedPlan() to handle
+ * cases where the original generic CachedPlan becomes invalid. Such
+ * invalidation may occur when prunable relations in the old plan for the
+ * query_index'th query are locked in preparation for execution.
+ *
+ * Note that invalidations received during the execution of the query_index'th
+ * query can affect both the queries that have already finished execution
+ * (e.g., due to concurrent modifications on prunable relations that were not
+ * locked during their execution) and also the queries that have not yet been
+ * executed. As a result, this function updates all plans to ensure
+ * CachedPlan.is_valid is safely set to true.
+ *
+ * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
+ * the caller and any of its callers must not rely on them remaining accessible
+ * after this function is called.
+ */
+PlannedStmt *
+UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ ListCell *l1,
+ *l2;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
+ else if (plan->is_oneshot)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan() returned
+ * the CachedPlan. See the comment in BuildCachedPlan() for details on why
+ * this might happen. Although invalidation is likely a false positive as
+ * stated there, we make the plan valid to ensure the query list used for
+ * planning is up to date.
+ *
+ * The risk of catching an invalidation is higher here than when
+ * BuildCachedPlan() is called from GetCachedPlan(), because this function
+ * is normally called long after GetCachedPlan() returns the CachedPlan,
+ * so much more processing could have occurred including things that mark
+ * the CachedPlanSource invalid.
+ *
+ * Note: Do not release plansource->gplan, because the upstream callers
+ * (such as the callers of ExecutorStartCachedPlan()) would still be
+ * referencing it.
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv, false);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for all the queries after making a copy to be
+ * scribbled on by the planner.
+ */
+ query_list = copyObject(query_list);
+
+ /*
+ * Planning work is done in the caller's memory context. The resulting
+ * PlannedStmt is then copied into plan->stmt_context after throwing away
+ * the old ones.
+ */
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+ Assert(list_length(plan_list) == list_length(plan->stmt_list));
+
+ MemoryContextReset(plan->stmt_context);
+ oldcxt = MemoryContextSwitchTo(plan->stmt_context);
+ forboth(l1, plan_list, l2, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst(l1);
+
+ lfirst(l2) = copyObject(plannedstmt);
+ }
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * XXX Should this also (re)set the properties of the CachedPlan that are
+ * set in BuildCachedPlan() after creating the fresh plans such as
+ * planRoleId, dependsOnRole, and save_xmin?
+ */
+
+ /*
+ * We've updated all the plans that might have been invalidated, so mark
+ * the CachedPlan as valid.
+ */
+ plan->is_valid = true;
+
+ /* Also update generic_cost because we just created a new generic plan. */
+ plansource->generic_cost = cached_plan_cost(plan, false);
+
+ return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1654,7 +1799,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv);
+ RevalidateCachedQuery(plansource, queryEnv, true);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -1776,7 +1921,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,13 +1939,16 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 0be1c2b0fff..e3526e78064 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f4..570e7cad1fa 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,10 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 2ed2c4bb378..4180601dcd4 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..ba53305ad42 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 30e2a82346f..d12e3f451d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -72,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -191,8 +192,11 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -255,6 +259,30 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
+
+/*
+ * Locks are needed only if running a cached plan that might contain unlocked
+ * relations, such as a reused generic plan.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a2cba97e3d5..9519dca374b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -655,6 +656,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
@@ -707,6 +709,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 46072d311b1..2d83f7d4930 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,8 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
+#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -139,10 +141,11 @@ typedef struct CachedPlanSource
* The reference count includes both the link from the parent CachedPlanSource
* (if any), and any active plan executions, so the plan can be discarded
* exactly when refcount goes to zero. Both the struct itself and the
- * subsidiary data live in the context denoted by the context field.
- * This makes it easy to free a no-longer-needed cached plan. (However,
- * if is_oneshot is true, the context does not belong solely to the CachedPlan
- * so no freeing is possible.)
+ * subsidiary data, except the PlannedStmts in stmt_list live in the context
+ * denoted by the context field; the PlannedStmts live in the context denoted
+ * by stmt_context. Separate contexts makes it easy to free a no-longer-needed
+ * cached plan. (However, if is_oneshot is true, the context does not belong
+ * solely to the CachedPlan so no freeing is possible.)
*/
typedef struct CachedPlan
{
@@ -150,6 +153,7 @@ typedef struct CachedPlan
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
+ bool is_reused; /* is it a reused generic plan? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
@@ -158,6 +162,10 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext stmt_context; /* context containing the PlannedStmts in
+ * stmt_list, but not the List itself which is
+ * in the above context; NULL if is_oneshot is
+ * true. */
} CachedPlan;
/*
@@ -223,6 +231,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -235,4 +247,34 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire additional locks?
+ *
+ * If the plan is a saved generic plan, the executor must acquire locks for
+ * relations that are not covered by AcquireExecutorLocks(), such as partitions
+ * that are subject to initial runtime pruning.
+ *
+ * Note: These locks are unnecessary if the plan is executed immediately after
+ * its creation, since the planner would have already acquired them. However,
+ * we do not optimize for that case.
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return !cplan->is_oneshot && cplan->is_reused;
+}
+
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 0b62143af8b..ddee031f551 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -240,7 +241,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846da..3eeb097fde4 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7bc97f84a1c..844af6bd061 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2025, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/fmgrprotos.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 00000000000..b37ea4096cf
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,270 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(8 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(47 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(41 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Index Scan using foo12_1_a on foo12_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(9 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Disabled: true
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(10 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index b53488f76d2..58159bfc574 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 00000000000..f27e8fb521c
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,80 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE TABLE bar (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE bar1 PARTITION OF bar FOR VALUES IN (1);
+ CREATE INDEX ON bar1(a);
+ CREATE TABLE bar2 PARTITION OF bar FOR VALUES IN (2);
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO UPDATE bar SET a = a WHERE a = one();
+ CREATE RULE update_bar AS ON UPDATE TO bar DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo, bar;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Another case with Append with run-time pruning in a subquery
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-12 11:53 ` Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-02-12 11:53 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Thu, Feb 6, 2025 at 11:35 AM Amit Langote <[email protected]> wrote:
> Per cfbot-ci, the new test case output in 0002 needed to be updated.
>
> I plan to push 0001 tomorrow, barring any objections.
I pushed that last Friday. With bb3ec16e, d47cbf47, and cbc12791 now in:
* Pruning information is now stored separately from parent plan nodes
in PlannedStmt.
* Initial runtime pruning occurs as a separate step, independent of
and before plan initialization in InitPlan().
* The RT indexes of unprunable relations and those of partitions that
survive initial pruning are stored in a global bitmapset in EState,
allowing us to avoid work that was previously done for pruned
partitions. This was difficult before because initial pruning wasn’t
performed before the parent plan node was initialized, meaning that
the work we aimed to save had already been done.
The final remaining piece is to skip taking locks on partitions pruned
during initial pruning, and the attached patch addresses that.
I’d like to commit the patch next week, barring objections.
--
Thanks, Amit Langote
Attachments:
[application/x-patch] v63-0001-Don-t-lock-partitions-pruned-by-initial-pruning.patch (88.4K, 2-v63-0001-Don-t-lock-partitions-pruned-by-initial-pruning.patch)
download | inline diff:
From c86d4e4f5b2973831d820af0e9bc96ea2b6b6951 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 12 Feb 2025 18:30:23 +0900
Subject: [PATCH v63] Don't lock partitions pruned by initial pruning
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.
This commit introduces changes that defer locking for such relations
and ensures that if the CachedPlan is invalidated due to concurrent
DDL during this window, replanning is triggered.
* Changes to locking when executing generic plans:
AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily. The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning. This deferral of locks allows runtime-prunable
relations to be handled later during executor startup, reducing
unnecessary locking overhead, particularly for generic plans with
many partitions that are prunable with initial runtime pruning.
This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.
ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.
* Testing:
To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.
The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.
I (amitlan) have discussed various aspects of this project offlist with
Robert Haas and David Rowley multiple times, and their advice has been
valuable.
Reviewed-by: Robert Haas <[email protected]> (earlier versions)
Reviewed-by: David Rowley <[email protected]> (earlier versions)
Reviewed-by: Tom Lane <[email protected]> (earlier versions)
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 16 +-
.../pg_stat_statements/pg_stat_statements.c | 16 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 5 +-
src/backend/commands/explain.c | 22 +-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 5 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/commands/trigger.c | 15 ++
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 130 ++++++++-
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 38 ++-
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 4 +-
src/backend/executor/spi.c | 29 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 51 +++-
src/backend/utils/cache/plancache.c | 204 ++++++++++++--
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 6 +-
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 34 ++-
src/include/nodes/execnodes.h | 3 +
src/include/utils/plancache.h | 50 +++-
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 66 ++++-
.../expected/cached-plan-inval.out | 250 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 86 ++++++
33 files changed, 1025 insertions(+), 98 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f1ad876e821..82c17c0a28a 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -76,7 +76,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -256,9 +256,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -294,9 +296,13 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
if (auto_explain_enabled())
{
@@ -314,6 +320,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb0..b735381cb0b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -332,7 +332,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -986,13 +986,19 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!plan_valid)
+ return false;
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1015,6 +1021,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return true;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..091fbc12cc5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -556,7 +556,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -566,7 +566,8 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- ExecutorStart(cstate->queryDesc, 0);
+ if (!ExecutorStart(cstate->queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 23cecd99c9e..44b4665ccd3 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -332,12 +332,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ if (!ExecutorStart(queryDesc, GetIntoRelEFlags(into)))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e1..af25c16d215 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -519,7 +519,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -641,7 +642,9 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -697,7 +700,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -711,8 +714,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Prepare the plan for execution. */
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index ba540e3de5b..1b28d20412e 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -907,11 +907,13 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ if (!ExecutorStart(qdesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
ExecutorRun(qdesc, ForwardScanDirection, 0);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c12817091ed..0bfbc5ca6dc 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,12 +438,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index e7c8171c102..4c2ac045224 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -117,6 +117,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 8989c0c882d..c025b1f9f8c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -582,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int query_index = 0;
if (es->memory)
{
@@ -654,7 +656,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -665,6 +668,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ query_index++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 97c087929f3..565f0db0b9a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5057,6 +5057,21 @@ AfterTriggerBeginQuery(void)
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by standard_ExecutorEnd() if the query execution was aborted due to
+ * the plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
+
/* ----------
* AfterTriggerEndQuery()
*
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be613..449c6068ae9 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+flag. If the plan tree is outdated (is_valid = false), the executor stops
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartCachedPlan
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartCachedPlan(), which will create a new plan
+tree and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 39d80ccfbad..0764a5d8855 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,11 +55,13 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -114,11 +116,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Return value indicates if the plan has been initialized successfully so
+ * that queryDesc->planstate contains a valid PlanState tree. It may not
+ * if the plan got invalidated during InitPlan().
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -130,12 +137,14 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
+ plan_valid = (*ExecutorStart_hook) (queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ return plan_valid;
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
EState *estate;
@@ -259,6 +268,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ return ExecPlanStillValid(queryDesc->estate);
+}
+
+/*
+ * ExecutorStartCachedPlan
+ * Start execution for a given query in the CachedPlanSource, replanning
+ * if the plan is invalidated due to deferred locks taken during the
+ * plan's initialization
+ *
+ * This function handles cases where the CachedPlan given in queryDesc->cplan
+ * might become invalid during the initialization of the plan given in
+ * queryDesc->plannedstmt, particularly when prunable relations in it are
+ * locked after performing initial pruning. If the locks invalidate the plan,
+ * the function calls UpdateCachedPlan() to replan all queries in the
+ * CachedPlan, and then retries initialization.
+ *
+ * The function repeats the process until ExecutorStart() successfully
+ * initializes the plan, that is without the CachedPlan becoming invalid.
+ */
+void
+ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (unlikely(queryDesc->cplan == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
+ if (unlikely(plansource == NULL))
+ elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlanSource");
+
+ /*
+ * Loop and retry with an updated plan until no further invalidation
+ * occurs.
+ */
+ while (1)
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /*
+ * Clean up the current execution state before creating the new
+ * plan to retry ExecutorStart(). Mark execution as aborted to
+ * ensure that AFTER trigger state is properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ /* Retry ExecutorStart() with an updated plan tree. */
+ queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+ }
+ else
+
+ /*
+ * Exit the loop if the plan is initialized successfully and no
+ * sinval messages were received that invalidated the CachedPlan.
+ */
+ break;
+ }
}
/* ----------------------------------------------------------------
@@ -317,6 +384,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -423,8 +491,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -487,11 +558,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
(PgStat_Counter) estate->es_parallel_workers_launched);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -505,6 +575,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -603,6 +681,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -828,6 +921,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If the plan originates from a CachedPlan (given in queryDesc->cplan),
+ * it can become invalid during runtime "initial" pruning when the
+ * remaining set of locks is taken. The function returns early in that
+ * case without initializing the plan, and the caller is expected to
+ * retry with a new valid plan.
* ----------------------------------------------------------------
*/
static void
@@ -835,6 +934,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -855,6 +955,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -865,9 +966,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* executed, are saved in es_part_prune_results. These results correspond
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
*/
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2871,6 +2978,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 134ff62f5cb..1bedb808368 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1258,8 +1258,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but parallel workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1440,7 +1447,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ if (!ExecutorStart(queryDesc, fpes->eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b6e89d0620d..432eeaf9034 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -1768,7 +1769,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * all plan nodes that contain a PartitionPruneInfo.
+ * all plan nodes that contain a PartitionPruneInfo. This also locks the
+ * leaf partitions whose subnodes will be initialized if needed.
*
* ExecInitPartitionExecPruning:
* Updates the PartitionPruneState found at given part_prune_index in
@@ -1789,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * plan nodes that support partition pruning.
+ * plan nodes that support partition pruning. This also locks the leaf
+ * partitions whose subnodes will be initialized if needed.
*
* This function iterates over each PartitionPruneInfo entry in
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
@@ -1816,6 +1820,7 @@ void
ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
@@ -1841,11 +1846,40 @@ ExecDoInitialPruning(EState *estate)
else
validsubplan_rtis = all_leafpart_rtis;
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ locked_relids = lappend_int(locked_relids, rtindex);
+ }
+ }
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
+
+ /*
+ * Release the useless locks if the plan won't be executed. This is the
+ * same as what CheckCachedPlan() in plancache.c does.
+ */
+ if (!ExecPlanStillValid(estate))
+ {
+ foreach(lc, locked_relids)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 448fc09aaac..39d6f4d819e 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -813,6 +814,10 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed in ExecEndPlan().
+ *
+ * Note: The caller must ensure that 'rti' refers to an unpruned relation
+ * (i.e., it is a member of estate->es_unpruned_relids) before calling this
+ * function. Attempting to open a pruned relation will result in an error.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -821,6 +826,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ if (!bms_is_member(rti, estate->es_unpruned_relids))
+ elog(ERROR, "trying to open a pruned relation");
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 757f8068e21..6aa8e9c4d8a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -864,7 +865,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+ if (!ExecutorStart(es->qd, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index ecb2e4ccaa1..3288396def3 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1685,7 +1686,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2500,6 +2502,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int query_index = 0;
spicallbackarg.query = plansource->query_string;
@@ -2690,14 +2693,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, query_index);
FreeQueryDesc(qdesc);
}
else
@@ -2794,6 +2799,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ query_index++;
}
/* Done with this plan, so release refcount */
@@ -2871,7 +2878,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2927,7 +2935,16 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, eflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
ExecutorRun(queryDesc, ForwardScanDirection, tcount);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5655348a2e2..f60f2785bc1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1224,6 +1224,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2025,7 +2026,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6f22496305a..dea24453a6c 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -36,6 +37,9 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +69,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +82,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +128,9 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +143,9 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,14 +157,23 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ if (queryDesc->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, 0, plansource, query_index);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, 0))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* Run the plan to completion.
@@ -493,6 +514,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -512,9 +534,19 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Prepare the plan for execution.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (portal->cplan)
+ {
+ ExecutorStartCachedPlan(queryDesc, myeflags,
+ portal->plansource, 0);
+ Assert(queryDesc->planstate);
+ }
+ else
+ {
+ if (!ExecutorStart(queryDesc, myeflags))
+ elog(ERROR, "ExecutorStart() failed unexpectedly");
+ }
/*
* This tells PortalCleanup to shut down the executor
@@ -1188,6 +1220,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int query_index = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1269,6 +1302,9 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1278,6 +1314,9 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
+ portal->plansource,
+ query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1342,6 +1381,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ query_index++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 55db8f53705..7ffd9dafae0 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,7 +101,8 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ bool release_generic);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -578,10 +579,17 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
+ *
+ * This also releases and drops the generic plan (plansource->gplan), if any,
+ * as most callers will typically build a new CachedPlan for the plansource
+ * right after this. However, when called from UpdateCachedPlan(), the
+ * function does not release the generic plan, as UpdateCachedPlan() updates
+ * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv)
+ QueryEnvironment *queryEnv,
+ bool release_generic)
{
bool snapshot_set;
RawStmt *rawtree;
@@ -678,8 +686,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference if any */
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference, if any, and if requested */
+ if (release_generic)
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -815,8 +824,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. However, the plans are not fully
+ * race-condition-free until the executor acquires locks on the prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -870,7 +882,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
- /* Successfully revalidated and locked the query. */
+ /*
+ * Successfully revalidated and locked the query. Set is_reused
+ * to true so that CachedPlanRequiresLocking() returns true.
+ */
+ plan->is_reused = true;
return true;
}
@@ -895,12 +911,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* To build a generic, parameter-value-independent plan, pass NULL for
* boundParams. To build a custom plan, pass the actual parameter values via
* boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * each parameter value; otherwise the planner will treat the value as a hint
+ * rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -911,6 +929,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
+ MemoryContext stmt_context = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -928,7 +947,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/*
* If we don't already have a copy of the querytree list that can be
@@ -967,10 +986,19 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
PopActiveSnapshot();
/*
- * Normally we make a dedicated memory context for the CachedPlan and its
- * subsidiary data. (It's probably not going to be large, but just in
- * case, allow it to grow large. It's transient for the moment.) But for
- * a one-shot plan, we just leave it in the caller's memory context.
+ * Normally, we create a dedicated memory context for the CachedPlan and
+ * its subsidiary data. Although it's usually not very large, the context
+ * is designed to allow growth if necessary.
+ *
+ * The PlannedStmts are stored in a separate child context (stmt_context)
+ * of the CachedPlan's memory context. This separation allows
+ * UpdateCachedPlan() to free and replace the PlannedStmts without
+ * affecting the CachedPlan structure or its stmt_list List.
+ *
+ * For one-shot plans, we instead use the caller's memory context, as the
+ * CachedPlan will not persist. stmt_context will be set to NULL in this
+ * case, because UpdateCachedPlan() should never get called on a one-shot
+ * plan.
*/
if (!plansource->is_oneshot)
{
@@ -979,12 +1007,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- /*
- * Copy plan into the new context.
- */
- MemoryContextSwitchTo(plan_context);
+ stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan PlannedStmts",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
+ MemoryContextSetParent(stmt_context, plan_context);
+ MemoryContextSwitchTo(stmt_context);
plist = copyObject(plist);
+
+ MemoryContextSwitchTo(plan_context);
+ plist = list_copy(plist);
}
else
plan_context = CurrentMemoryContext;
@@ -1025,8 +1058,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->saved_xmin = InvalidTransactionId;
plan->refcount = 0;
plan->context = plan_context;
+ plan->stmt_context = stmt_context;
plan->is_oneshot = plansource->is_oneshot;
plan->is_saved = false;
+ plan->is_reused = false;
plan->is_valid = true;
/* assign generation number to new plan */
@@ -1037,6 +1072,113 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
return plan;
}
+/*
+ * UpdateCachedPlan
+ * Create fresh plans for all queries in the CachedPlanSource, replacing
+ * those in the generic plan's stmt_list, and return the plan for the
+ * query_index'th query.
+ *
+ * This function is primarily used by ExecutorStartCachedPlan() to handle
+ * cases where the original generic CachedPlan becomes invalid. Such
+ * invalidation may occur when prunable relations in the old plan for the
+ * query_index'th query are locked in preparation for execution.
+ *
+ * Note that invalidations received during the execution of the query_index'th
+ * query can affect both the queries that have already finished execution
+ * (e.g., due to concurrent modifications on prunable relations that were not
+ * locked during their execution) and also the queries that have not yet been
+ * executed. As a result, this function updates all plans to ensure
+ * CachedPlan.is_valid is safely set to true.
+ *
+ * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
+ * the caller and any of its callers must not rely on them remaining accessible
+ * after this function is called.
+ */
+PlannedStmt *
+UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ ListCell *l1,
+ *l2;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
+ else if (plan->is_oneshot)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan() returned
+ * the CachedPlan. See the comment in BuildCachedPlan() for details on why
+ * this might happen. Although invalidation is likely a false positive as
+ * stated there, we make the plan valid to ensure the query list used for
+ * planning is up to date.
+ *
+ * The risk of catching an invalidation is higher here than when
+ * BuildCachedPlan() is called from GetCachedPlan(), because this function
+ * is normally called long after GetCachedPlan() returns the CachedPlan,
+ * so much more processing could have occurred including things that mark
+ * the CachedPlanSource invalid.
+ *
+ * Note: Do not release plansource->gplan, because the upstream callers
+ * (such as the callers of ExecutorStartCachedPlan()) would still be
+ * referencing it.
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv, false);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for all the queries after making a copy to be
+ * scribbled on by the planner.
+ */
+ query_list = copyObject(query_list);
+
+ /*
+ * Planning work is done in the caller's memory context. The resulting
+ * PlannedStmt is then copied into plan->stmt_context after throwing away
+ * the old ones.
+ */
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+ Assert(list_length(plan_list) == list_length(plan->stmt_list));
+
+ MemoryContextReset(plan->stmt_context);
+ oldcxt = MemoryContextSwitchTo(plan->stmt_context);
+ forboth(l1, plan_list, l2, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst(l1);
+
+ lfirst(l2) = copyObject(plannedstmt);
+ }
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * XXX Should this also (re)set the properties of the CachedPlan that are
+ * set in BuildCachedPlan() after creating the fresh plans such as
+ * planRoleId, dependsOnRole, and save_xmin?
+ */
+
+ /*
+ * We've updated all the plans that might have been invalidated, so mark
+ * the CachedPlan as valid.
+ */
+ plan->is_valid = true;
+
+ /* Also update generic_cost because we just created a new generic plan. */
+ plansource->generic_cost = cached_plan_cost(plan, false);
+
+ return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
+}
+
/*
* choose_custom_plan: choose whether to use custom or generic plan
*
@@ -1153,8 +1295,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid, but not all locks are acquired if the
+ * returned plan is a reused generic plan. In such cases, locks on relations
+ * subject to initial runtime pruning are not taken by CheckCachedPlan() but
+ * deferred until the execution startup phase, specifically when
+ * ExecDoInitialPruning() performs initial pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1180,7 +1325,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1654,7 +1799,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv);
+ RevalidateCachedQuery(plansource, queryEnv, true);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -1776,7 +1921,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,13 +1939,16 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 0be1c2b0fff..e3526e78064 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f4..570e7cad1fa 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,10 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 2ed2c4bb378..4180601dcd4 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..ba53305ad42 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 30e2a82346f..d12e3f451d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -72,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -191,8 +192,11 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -255,6 +259,30 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
+
+/*
+ * Locks are needed only if running a cached plan that might contain unlocked
+ * relations, such as a reused generic plan.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e2d1dc1e067..d838b76bb7e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -657,6 +658,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
@@ -709,6 +711,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 46072d311b1..2d83f7d4930 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,8 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
+#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -139,10 +141,11 @@ typedef struct CachedPlanSource
* The reference count includes both the link from the parent CachedPlanSource
* (if any), and any active plan executions, so the plan can be discarded
* exactly when refcount goes to zero. Both the struct itself and the
- * subsidiary data live in the context denoted by the context field.
- * This makes it easy to free a no-longer-needed cached plan. (However,
- * if is_oneshot is true, the context does not belong solely to the CachedPlan
- * so no freeing is possible.)
+ * subsidiary data, except the PlannedStmts in stmt_list live in the context
+ * denoted by the context field; the PlannedStmts live in the context denoted
+ * by stmt_context. Separate contexts makes it easy to free a no-longer-needed
+ * cached plan. (However, if is_oneshot is true, the context does not belong
+ * solely to the CachedPlan so no freeing is possible.)
*/
typedef struct CachedPlan
{
@@ -150,6 +153,7 @@ typedef struct CachedPlan
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
+ bool is_reused; /* is it a reused generic plan? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
@@ -158,6 +162,10 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext stmt_context; /* context containing the PlannedStmts in
+ * stmt_list, but not the List itself which is
+ * in the above context; NULL if is_oneshot is
+ * true. */
} CachedPlan;
/*
@@ -223,6 +231,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -235,4 +247,34 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire additional locks?
+ *
+ * If the plan is a saved generic plan, the executor must acquire locks for
+ * relations that are not covered by AcquireExecutorLocks(), such as partitions
+ * that are subject to initial runtime pruning.
+ *
+ * Note: These locks are unnecessary if the plan is executed immediately after
+ * its creation, since the planner would have already acquired them. However,
+ * we do not optimize for that case.
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return !cplan->is_oneshot && cplan->is_reused;
+}
+
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 0b62143af8b..ddee031f551 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -240,7 +241,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846da..3eeb097fde4 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7bc97f84a1c..ad22bc9f2a8 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2025, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/fmgrprotos.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -87,9 +128,22 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 00000000000..444e06b43e4
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,250 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+-----------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Index Scan using foo1_1_a on foo1_1 foo_1
+ Index Cond: (a = $1)
+(5 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo1_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo1_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Index Scan using foo1_1_a on foo1_1 foo_1
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo1_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo1_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Index Only Scan using foo1_1_a on foo1_1 foo_1
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on bar1 bar_1
+ Filter: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Index Scan using foo1_1_a on foo1_1 foo_1
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on bar1 bar_1
+ Filter: (a = one())
+
+Update on foo
+ Update on foo1_1 foo_1
+ Update on foo1_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Index Scan using foo1_1_a on foo1_1 foo_1
+ Index Cond: (a = ANY (ARRAY[one(), two()]))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(37 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo1_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo1_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on bar1 bar_1
+ Filter: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo1_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on bar1 bar_1
+ Filter: (a = one())
+
+Update on foo
+ Update on foo1_1 foo_1
+ Update on foo1_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo1_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo1_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(37 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Index Scan using foo1_1_a on foo1_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(9 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo1_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo1_1 foo_1
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(9 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index b53488f76d2..58159bfc574 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 00000000000..f0cf06f9060
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,86 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# is correctly detected causing an updated plan to be re-executed.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY RANGE (a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES FROM (MINVALUE) TO (3) PARTITION BY RANGE (a);
+ CREATE TABLE foo1_1 PARTITION OF foo1 FOR VALUES FROM (MINVALUE) TO (2);
+ CREATE TABLE foo1_2 PARTITION OF foo1 FOR VALUES FROM (2) TO (3);
+ CREATE INDEX foo1_1_a ON foo1_1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES FROM (3) TO (MAXVALUE);
+ INSERT INTO foo SELECT generate_series(-1000, 1000);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE TABLE bar (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE bar1 PARTITION OF bar FOR VALUES IN (1);
+ CREATE INDEX ON bar1(a);
+ CREATE TABLE bar2 PARTITION OF bar FOR VALUES IN (2);
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO UPDATE bar SET a = a WHERE a = one();
+ CREATE RULE update_bar AS ON UPDATE TO bar DO ALSO SELECT 1;
+ ANALYZE;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo, bar;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo1_1_a; }
+
+# In all permutations below, while "s1exec", "s1exec2", etc. wait to
+# acquire the advisory lock, "s2drop" drops the index being used in the
+# cached plan. When "s1exec" and others are unblocked and begin initializing
+# the plan, including acquiring necessary locks on partitions, the concurrent
+# index drop is detected. This causes plan initialization to be aborted,
+# prompting the caller to retry with a new plan.
+
+# Case with runtime pruning using EXTERN parameter
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+
+# Case with runtime pruning using stable function
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+
+# Case with a rule adding another query causing the CachedPlan to contain
+# multiple PlannedStmts
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+
+# Case with run-time pruning inside a subquery
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-21 03:40 ` Amit Langote <[email protected]>
2025-02-21 06:04 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
0 siblings, 2 replies; 66+ messages in thread
From: Amit Langote @ 2025-02-21 03:40 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Wed, Feb 12, 2025 at 8:53 PM Amit Langote <[email protected]> wrote:
> On Thu, Feb 6, 2025 at 11:35 AM Amit Langote <[email protected]> wrote:
> > Per cfbot-ci, the new test case output in 0002 needed to be updated.
> >
> > I plan to push 0001 tomorrow, barring any objections.
>
> I pushed that last Friday. With bb3ec16e, d47cbf47, and cbc12791 now in:
>
> * Pruning information is now stored separately from parent plan nodes
> in PlannedStmt.
>
> * Initial runtime pruning occurs as a separate step, independent of
> and before plan initialization in InitPlan().
>
> * The RT indexes of unprunable relations and those of partitions that
> survive initial pruning are stored in a global bitmapset in EState,
> allowing us to avoid work that was previously done for pruned
> partitions. This was difficult before because initial pruning wasn’t
> performed before the parent plan node was initialized, meaning that
> the work we aimed to save had already been done.
>
> The final remaining piece is to skip taking locks on partitions pruned
> during initial pruning, and the attached patch addresses that.
>
> I’d like to commit the patch next week, barring objections.
I pushed the final piece yesterday.
Thank you all who have commented on this thread, reviewed the patches
in its various incarnations, and offered advice here or offlist.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-21 06:04 ` Tom Lane <[email protected]>
2025-02-21 06:36 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Tom Lane @ 2025-02-21 06:04 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> writes:
> I pushed the final piece yesterday.
trilobite reports that this fails under -DCLOBBER_CACHE_ALWAYS:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2025-02-20%2019%3A37%3A12
regards, tom lane
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 06:04 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
@ 2025-02-21 06:36 ` Amit Langote <[email protected]>
2025-02-21 08:07 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-02-21 06:36 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Fri, Feb 21, 2025 at 3:04 PM Tom Lane <[email protected]> wrote:
>
> Amit Langote <[email protected]> writes:
> > I pushed the final piece yesterday.
>
> trilobite reports that this fails under -DCLOBBER_CACHE_ALWAYS:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2025-02-20%2019%3A37%3A12
Looking, thanks for the heads up.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 06:04 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-02-21 06:36 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-21 08:07 ` Amit Langote <[email protected]>
2025-02-21 15:55 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-02-21 08:07 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Fri, Feb 21, 2025 at 3:36 PM Amit Langote <[email protected]> wrote:
> On Fri, Feb 21, 2025 at 3:04 PM Tom Lane <[email protected]> wrote:
> >
> > Amit Langote <[email protected]> writes:
> > > I pushed the final piece yesterday.
> >
> > trilobite reports that this fails under -DCLOBBER_CACHE_ALWAYS:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2025-02-20%2019%3A37%3A12
>
> Looking, thanks for the heads up.
The short of it is that the cached-plan-inval test in the
delay_execution suite can never be made to work under
CLOBBER_CACHE_ALWAYS. The test assumes that locks on partitions for a
reused generic plan are not taken until InitPlan(). However, under
CLOBBER_CACHE_ALWAYS, generic plans are never reused, so the test's
assumption never holds.
I see two possible ways to address this:
1. Find a way to disable the cached-plan-inval test in
CLOBBER_CACHE_ALWAYS builds. However, I haven't found any other test
that does this.
2. Remove the test altogether, though that might be too drastic.
Thoughts?
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 06:04 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-02-21 06:36 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 08:07 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-21 15:55 ` Tom Lane <[email protected]>
2025-02-22 02:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Tom Lane @ 2025-02-21 15:55 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> writes:
> The short of it is that the cached-plan-inval test in the
> delay_execution suite can never be made to work under
> CLOBBER_CACHE_ALWAYS. The test assumes that locks on partitions for a
> reused generic plan are not taken until InitPlan(). However, under
> CLOBBER_CACHE_ALWAYS, generic plans are never reused, so the test's
> assumption never holds.
Ugh.
> I see two possible ways to address this:
> 1. Find a way to disable the cached-plan-inval test in
> CLOBBER_CACHE_ALWAYS builds. However, I haven't found any other test
> that does this.
> 2. Remove the test altogether, though that might be too drastic.
Well, you could force matters with "set debug_discard_caches = 0"
within the test, but I think that's just a band-aid that would
not make the test fully stable. The point of CLOBBER_CACHE_ALWAYS
is to model random arrival of cache flush events, which is *always*
a possibility due to background activity (autovacuum for instance).
We do have a couple of other regression tests that rely on
"set debug_discard_caches = 0", and I've not seen many buildfarm
failures tracing to that, but I don't trust it a whole lot.
How badly do you want to keep this test case? It seems fairly
rickety to me, even without this particular concern.
regards, tom lane
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 06:04 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-02-21 06:36 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 08:07 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 15:55 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
@ 2025-02-22 02:13 ` Amit Langote <[email protected]>
2025-02-22 06:29 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-02-22 02:13 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Sat, Feb 22, 2025 at 12:55 AM Tom Lane <[email protected]> wrote:
> Amit Langote <[email protected]> writes:
> > The short of it is that the cached-plan-inval test in the
> > delay_execution suite can never be made to work under
> > CLOBBER_CACHE_ALWAYS. The test assumes that locks on partitions for a
> > reused generic plan are not taken until InitPlan(). However, under
> > CLOBBER_CACHE_ALWAYS, generic plans are never reused, so the test's
> > assumption never holds.
>
> Ugh.
>
> > I see two possible ways to address this:
>
> > 1. Find a way to disable the cached-plan-inval test in
> > CLOBBER_CACHE_ALWAYS builds. However, I haven't found any other test
> > that does this.
>
> > 2. Remove the test altogether, though that might be too drastic.
>
> Well, you could force matters with "set debug_discard_caches = 0"
> within the test, but I think that's just a band-aid that would
> not make the test fully stable. The point of CLOBBER_CACHE_ALWAYS
> is to model random arrival of cache flush events, which is *always*
> a possibility due to background activity (autovacuum for instance).
>
> We do have a couple of other regression tests that rely on
> "set debug_discard_caches = 0", and I've not seen many buildfarm
> failures tracing to that, but I don't trust it a whole lot.
>
> How badly do you want to keep this test case? It seems fairly
> rickety to me, even without this particular concern.
Hmm, yeah, I have to admit that even if we address this specific
issue, the risk of this test failing again outweighs the likelihood of
it catching a real breakage in the deferred lock mechanism.
I'll remove the test for now.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 06:04 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-02-21 06:36 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 08:07 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 15:55 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-02-22 02:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-22 06:29 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2025-02-22 06:29 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Sat, Feb 22, 2025 at 11:13 AM Amit Langote <[email protected]> wrote:
> On Sat, Feb 22, 2025 at 12:55 AM Tom Lane <[email protected]> wrote:
> > Amit Langote <[email protected]> writes:
> > > The short of it is that the cached-plan-inval test in the
> > > delay_execution suite can never be made to work under
> > > CLOBBER_CACHE_ALWAYS. The test assumes that locks on partitions for a
> > > reused generic plan are not taken until InitPlan(). However, under
> > > CLOBBER_CACHE_ALWAYS, generic plans are never reused, so the test's
> > > assumption never holds.
> >
> > Ugh.
> >
> > > I see two possible ways to address this:
> >
> > > 1. Find a way to disable the cached-plan-inval test in
> > > CLOBBER_CACHE_ALWAYS builds. However, I haven't found any other test
> > > that does this.
> >
> > > 2. Remove the test altogether, though that might be too drastic.
> >
> > Well, you could force matters with "set debug_discard_caches = 0"
> > within the test, but I think that's just a band-aid that would
> > not make the test fully stable. The point of CLOBBER_CACHE_ALWAYS
> > is to model random arrival of cache flush events, which is *always*
> > a possibility due to background activity (autovacuum for instance).
> >
> > We do have a couple of other regression tests that rely on
> > "set debug_discard_caches = 0", and I've not seen many buildfarm
> > failures tracing to that, but I don't trust it a whole lot.
> >
> > How badly do you want to keep this test case? It seems fairly
> > rickety to me, even without this particular concern.
>
> Hmm, yeah, I have to admit that even if we address this specific
> issue, the risk of this test failing again outweighs the likelihood of
> it catching a real breakage in the deferred lock mechanism.
>
> I'll remove the test for now.
Done. I'll try to think of a more robust testing approach for this,
but I’m not very optimistic :-(.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-22 15:00 ` Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Alexander Lakhin @ 2025-02-22 15:00 UTC (permalink / raw)
To: Amit Langote <[email protected]>; Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
Hello Amit,
21.02.2025 05:40, Amit Langote wrote:
> I pushed the final piece yesterday.
Please look at new error, produced by the following script,
starting from 525392d57:
CREATE TABLE t(id int) PARTITION BY RANGE (id);
CREATE INDEX idx on t(id);
CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (10) TO (20);
CREATE TABLE tp_2 PARTITION OF t FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(id);
CREATE TABLE tp_2_1 PARTITION OF tp_2 FOR VALUES FROM (21) to (22);
CREATE TABLE tp_2_2 PARTITION OF tp_2 FOR VALUES FROM (22) to (23);
CREATE FUNCTION stable_one() RETURNS INT AS $$ BEGIN RETURN 1; END; $$ LANGUAGE plpgsql STABLE;
SELECT min(id) OVER (PARTITION BY id ORDER BY id) FROM t WHERE id >= stable_one();
ERROR: XX000: trying to open a pruned relation
LOCATION: ExecGetRangeTableRelation, execUtils.c:830
This issue was discovered with SQLsmith.
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
@ 2025-02-22 17:02 ` Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Tender Wang @ 2025-02-22 17:02 UTC (permalink / raw)
To: Alexander Lakhin <[email protected]>; +Cc: Amit Langote <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
Alexander Lakhin <[email protected]> 于2025年2月22日周六 23:00写道:
> Hello Amit,
>
> 21.02.2025 05:40, Amit Langote wrote:
>
> I pushed the final piece yesterday.
>
>
> Please look at new error, produced by the following script,
> starting from 525392d57:
> CREATE TABLE t(id int) PARTITION BY RANGE (id);
> CREATE INDEX idx on t(id);
> CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (10) TO (20);
> CREATE TABLE tp_2 PARTITION OF t FOR VALUES FROM (20) TO (30) PARTITION BY
> RANGE(id);
> CREATE TABLE tp_2_1 PARTITION OF tp_2 FOR VALUES FROM (21) to (22);
> CREATE TABLE tp_2_2 PARTITION OF tp_2 FOR VALUES FROM (22) to (23);
> CREATE FUNCTION stable_one() RETURNS INT AS $$ BEGIN RETURN 1; END; $$
> LANGUAGE plpgsql STABLE;
>
> SELECT min(id) OVER (PARTITION BY id ORDER BY id) FROM t WHERE id >=
> stable_one();
>
> ERROR: XX000: trying to open a pruned relation
> LOCATION: ExecGetRangeTableRelation, execUtils.c:830
>
> This issue was discovered with SQLsmith.
>
The error message was added in commit 525392d57. In this case,
the estate->es_unpruned_relids only includes 1, which is the offset of
table t.
In register_partpruneinfo(), we collect glob->prunableRelids; in this case,
it contains 2,3,4,5. Then we will do:
result->unprunableRelids = bms_difference(glob->allRelids,
glob->prunableRelids);
so the result->unprunableRelids only contains 1.
But tp_2 is also partition table, and its partpruneinfo created by
create_append_plan() is put into the head of global list.
So we first process it in ExecDoInitialPruning(). Then error reports
because we only contain 1 in estate->es_unpruned_relids.
--
Thanks,
Tender Wang
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
@ 2025-02-23 08:35 ` Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-02-23 08:35 UTC (permalink / raw)
To: Tender Wang <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Sun, Feb 23, 2025 at 2:03 AM Tender Wang <[email protected]> wrote:
> Alexander Lakhin <[email protected]> 于2025年2月22日周六 23:00写道:
>> 21.02.2025 05:40, Amit Langote wrote:
>>
>> I pushed the final piece yesterday.
>>
>>
>> Please look at new error, produced by the following script,
>> starting from 525392d57:
>> CREATE TABLE t(id int) PARTITION BY RANGE (id);
>> CREATE INDEX idx on t(id);
>> CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (10) TO (20);
>> CREATE TABLE tp_2 PARTITION OF t FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(id);
>> CREATE TABLE tp_2_1 PARTITION OF tp_2 FOR VALUES FROM (21) to (22);
>> CREATE TABLE tp_2_2 PARTITION OF tp_2 FOR VALUES FROM (22) to (23);
>> CREATE FUNCTION stable_one() RETURNS INT AS $$ BEGIN RETURN 1; END; $$ LANGUAGE plpgsql STABLE;
>>
>> SELECT min(id) OVER (PARTITION BY id ORDER BY id) FROM t WHERE id >= stable_one();
>>
>> ERROR: XX000: trying to open a pruned relation
>> LOCATION: ExecGetRangeTableRelation, execUtils.c:830
>>
>> This issue was discovered with SQLsmith.
Thanks for the report.
> The error message was added in commit 525392d57. In this case, the estate->es_unpruned_relids only includes 1, which is the offset of table t.
> In register_partpruneinfo(), we collect glob->prunableRelids; in this case, it contains 2,3,4,5. Then we will do:
> result->unprunableRelids = bms_difference(glob->allRelids,
> glob->prunableRelids);
> so the result->unprunableRelids only contains 1.
>
> But tp_2 is also partition table, and its partpruneinfo created by create_append_plan() is put into the head of global list.
> So we first process it in ExecDoInitialPruning(). Then error reports because we only contain 1 in estate->es_unpruned_relids.
Thanks for checking.
The RT index of tp_2 should appear in PlannedStmt.unprunableRelids,
because it needs to be opened in CreatePartitionPruneState() for
setting up its PartitionPruneInfo. We use ExecGetRangeTableRelation()
to open, which expects the relation to be locked, so the error.
To ensure tp_2 appears in PlannedStmt.unprunableRelids, we should
prevent make_partitionedrel_pruneinfo() from placing the RT index into
leafpart_rti_map[], as the current condition for inclusion doesn’t
account for whether the partition is itself partitioned.
I've come up with the attached.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v1-0001-Fix-bug-in-cbc127917-to-handle-nested-Append-corr.patch (5.4K, 2-v1-0001-Fix-bug-in-cbc127917-to-handle-nested-Append-corr.patch)
download | inline diff:
From f829cd9a679003b14030679adbd08aae70cefc55 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sun, 23 Feb 2025 17:19:28 +0900
Subject: [PATCH v1] Fix bug in cbc127917 to handle nested Append correctly
A sub-partitioned partition with a subplan that is an Append node was
not correctly reported in PlannedStmt.unprunableRelids. This omission
led to ExecGetRangeTableRelation() reporting an error when called from
CreatePartitionPruneState() to process its PartitionPruneInfo.
Reported-by: Alexander Lakhin <[email protected]> (via sqlsmith)
Diagnosed-by: Tender Wang <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/backend/executor/execPartition.c | 14 +++++++++----
src/backend/partitioning/partprune.c | 8 +++++++-
src/test/regress/expected/partition_prune.out | 20 +++++++++++++++++++
src/test/regress/sql/partition_prune.sql | 7 +++++++
4 files changed, 44 insertions(+), 5 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 432eeaf9034..b86fc5ea297 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2589,9 +2589,9 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
- * of their corresponding leaf partitions to *validsubplan_rtis if
- * it's non-NULL.
+ * Adds valid (non-prunable) subplan IDs to *validsubplans. If
+ * *validsubplan_rtis is non-NULL, it also adds the RT indexes of their
+ * corresponding partitions, but only if they are leaf partitions.
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
@@ -2628,7 +2628,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
{
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
- if (validsubplan_rtis)
+
+ /*
+ * Subplan might be a nested Append / MergeAppend for a
+ * sub-partitioned partition whose RT index need not be reported
+ * to the caller.
+ */
+ if (validsubplan_rtis && pprune->leafpart_rti_map[i])
*validsubplan_rtis = bms_add_member(*validsubplan_rtis,
pprune->leafpart_rti_map[i]);
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index ff926732f36..3faf3f8555c 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -687,7 +687,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
- leafpart_rti_map[i] = (int) partrel->relid;
+
+ /*
+ * Subplan might be a nested Append/MergeAppend for a
+ * sub-partitioned partition.
+ */
+ if (partrel->nparts == -1)
+ leafpart_rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 6f80b62a3b8..94a499fc8e0 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4590,5 +4590,25 @@ table part_abc_view;
2 | c | t
(1 row)
+-- A case with nested Append with its own PartitionPruneInfo.
+create index on part_abc (a);
+create table part_abc_3 partition of part_abc for values in (3, 4) partition by list (a);
+create table part_abc_3_3 partition of part_abc_3 for values in (3);
+create table part_abc_3_4 partition of part_abc_3 for values in (4);
+explain (costs off) select min(a) over (partition by a order by a) from part_abc where a >= stable_one() + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ WindowAgg
+ -> Append
+ Subplans Removed: 1
+ -> Index Only Scan using part_abc_2_a_idx on part_abc_2 part_abc_1
+ Index Cond: (a >= (stable_one() + 1))
+ -> Append
+ -> Index Only Scan using part_abc_3_3_a_idx on part_abc_3_3 part_abc_3
+ Index Cond: (a >= (stable_one() + 1))
+ -> Index Only Scan using part_abc_3_4_a_idx on part_abc_3_4 part_abc_4
+ Index Cond: (a >= (stable_one() + 1))
+(10 rows)
+
drop view part_abc_view;
drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 86621dcec0b..48273f8d027 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1397,5 +1397,12 @@ using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (q.pid = pt1.
when matched then delete returning pt.a;
table part_abc_view;
+-- A case with nested Append with its own PartitionPruneInfo.
+create index on part_abc (a);
+create table part_abc_3 partition of part_abc for values in (3, 4) partition by list (a);
+create table part_abc_3_3 partition of part_abc_3 for values in (3);
+create table part_abc_3_4 partition of part_abc_3 for values in (4);
+explain (costs off) select min(a) over (partition by a order by a) from part_abc where a >= stable_one() + 1;
+
drop view part_abc_view;
drop table part_abc;
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-23 12:46 ` Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Tender Wang @ 2025-02-23 12:46 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
Amit Langote <[email protected]> 于2025年2月23日周日 16:36写道:
> On Sun, Feb 23, 2025 at 2:03 AM Tender Wang <[email protected]> wrote:
> > Alexander Lakhin <[email protected]> 于2025年2月22日周六 23:00写道:
> >> 21.02.2025 05:40, Amit Langote wrote:
> >>
> >> I pushed the final piece yesterday.
> >>
> >>
> >> Please look at new error, produced by the following script,
> >> starting from 525392d57:
> >> CREATE TABLE t(id int) PARTITION BY RANGE (id);
> >> CREATE INDEX idx on t(id);
> >> CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (10) TO (20);
> >> CREATE TABLE tp_2 PARTITION OF t FOR VALUES FROM (20) TO (30) PARTITION
> BY RANGE(id);
> >> CREATE TABLE tp_2_1 PARTITION OF tp_2 FOR VALUES FROM (21) to (22);
> >> CREATE TABLE tp_2_2 PARTITION OF tp_2 FOR VALUES FROM (22) to (23);
> >> CREATE FUNCTION stable_one() RETURNS INT AS $$ BEGIN RETURN 1; END; $$
> LANGUAGE plpgsql STABLE;
> >>
> >> SELECT min(id) OVER (PARTITION BY id ORDER BY id) FROM t WHERE id >=
> stable_one();
> >>
> >> ERROR: XX000: trying to open a pruned relation
> >> LOCATION: ExecGetRangeTableRelation, execUtils.c:830
> >>
> >> This issue was discovered with SQLsmith.
>
> Thanks for the report.
>
> > The error message was added in commit 525392d57. In this case, the
> estate->es_unpruned_relids only includes 1, which is the offset of table t.
> > In register_partpruneinfo(), we collect glob->prunableRelids; in this
> case, it contains 2,3,4,5. Then we will do:
> > result->unprunableRelids = bms_difference(glob->allRelids,
> > glob->prunableRelids);
> > so the result->unprunableRelids only contains 1.
> >
> > But tp_2 is also partition table, and its partpruneinfo created by
> create_append_plan() is put into the head of global list.
> > So we first process it in ExecDoInitialPruning(). Then error reports
> because we only contain 1 in estate->es_unpruned_relids.
>
> Thanks for checking.
>
> The RT index of tp_2 should appear in PlannedStmt.unprunableRelids,
> because it needs to be opened in CreatePartitionPruneState() for
> setting up its PartitionPruneInfo. We use ExecGetRangeTableRelation()
> to open, which expects the relation to be locked, so the error.
>
> To ensure tp_2 appears in PlannedStmt.unprunableRelids, we should
> prevent make_partitionedrel_pruneinfo() from placing the RT index into
> leafpart_rti_map[], as the current condition for inclusion doesn’t
> account for whether the partition is itself partitioned.
>
> I've come up with the attached.
>
LGTM.
--
Thanks,
Tender Wang
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
@ 2025-02-25 02:51 ` Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-02-25 02:51 UTC (permalink / raw)
To: Tender Wang <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Sun, Feb 23, 2025 at 9:46 PM Tender Wang <[email protected]> wrote:
> Amit Langote <[email protected]> 于2025年2月23日周日 16:36写道:
>> On Sun, Feb 23, 2025 at 2:03 AM Tender Wang <[email protected]> wrote:
>> > Alexander Lakhin <[email protected]> 于2025年2月22日周六 23:00写道:
>> >> Please look at new error, produced by the following script,
>> >> starting from 525392d57:
>> >> CREATE TABLE t(id int) PARTITION BY RANGE (id);
>> >> CREATE INDEX idx on t(id);
>> >> CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (10) TO (20);
>> >> CREATE TABLE tp_2 PARTITION OF t FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(id);
>> >> CREATE TABLE tp_2_1 PARTITION OF tp_2 FOR VALUES FROM (21) to (22);
>> >> CREATE TABLE tp_2_2 PARTITION OF tp_2 FOR VALUES FROM (22) to (23);
>> >> CREATE FUNCTION stable_one() RETURNS INT AS $$ BEGIN RETURN 1; END; $$ LANGUAGE plpgsql STABLE;
>> >>
>> >> SELECT min(id) OVER (PARTITION BY id ORDER BY id) FROM t WHERE id >= stable_one();
>> >>
>> >> ERROR: XX000: trying to open a pruned relation
>> >> LOCATION: ExecGetRangeTableRelation, execUtils.c:830
>> >>
>> >> This issue was discovered with SQLsmith.
>>
>> Thanks for the report.
>>
>> > The error message was added in commit 525392d57. In this case, the estate->es_unpruned_relids only includes 1, which is the offset of table t.
>> > In register_partpruneinfo(), we collect glob->prunableRelids; in this case, it contains 2,3,4,5. Then we will do:
>> > result->unprunableRelids = bms_difference(glob->allRelids,
>> > glob->prunableRelids);
>> > so the result->unprunableRelids only contains 1.
>> >
>> > But tp_2 is also partition table, and its partpruneinfo created by create_append_plan() is put into the head of global list.
>> > So we first process it in ExecDoInitialPruning(). Then error reports because we only contain 1 in estate->es_unpruned_relids.
>>
>> Thanks for checking.
>>
>> The RT index of tp_2 should appear in PlannedStmt.unprunableRelids,
>> because it needs to be opened in CreatePartitionPruneState() for
>> setting up its PartitionPruneInfo. We use ExecGetRangeTableRelation()
>> to open, which expects the relation to be locked, so the error.
>>
>> To ensure tp_2 appears in PlannedStmt.unprunableRelids, we should
>> prevent make_partitionedrel_pruneinfo() from placing the RT index into
>> leafpart_rti_map[], as the current condition for inclusion doesn’t
>> account for whether the partition is itself partitioned.
>>
>> I've come up with the attached.
>
> LGTM.
Pushed after some tweaks to comments and the test case.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-05-20 03:06 ` Tom Lane <[email protected]>
2025-05-20 07:59 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 2 replies; 66+ messages in thread
From: Tom Lane @ 2025-05-20 03:06 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> writes:
> Pushed after some tweaks to comments and the test case.
My attention was drawn to commit 525392d57 after observing that
Valgrind complained about a memory leak in some code that commit added
to BuildCachedPlan(). I tried to make sense of said code so I could
remove the leak, and eventually arrived at the attached patch, which
is part of a series of leak-fixing things hence the high sequence
number.
Unfortunately, the bad things I speculated about in the added comments
seem to be reality. The second attached file is a test case that
triggers
TRAP: failed Assert("list_length(plan_list) == list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087
because it adds a DO ALSO rule that causes the rewriter to generate
more PlannedStmts than it did before.
This is quite awful, because it does more than simply break the klugy
(and undocumented) business about keeping the top-level List in a
different context. What it means is that any outside code that is
busy iterating that List is very fundamentally broken: it's not clear
what List index it ought to resume at, except that "the one it was at"
is demonstrably incorrect.
I also don't really believe the (also undocumented) assumption that
such outside code is in between executions of PlannedStmts of the
List and hence can tolerate those being ripped out and replaced.
I have not attempted to build an example, because the one I have
seems sufficiently damning. But I bet that a recursive function
could be constructed in such a way that an outer execution is
still in progress when an inner call triggers UpdateCachedPlan.
Another small problem (much more easily fixable than the above,
probably) is that summarily setting "plan->is_valid = true"
at the end is not okay. We could already have received an
invalidation that should result in marking the plan stale.
(Holding locks on the tables involved is not sufficient to
prevent that, as there are other sources of inval events.)
It's possible that this code can be fixed, but I fear it's
going to involve some really fundamental redesign, which
probably shouldn't be happening after beta1. I think there
is no alternative but to revert for v18.
regards, tom lane
drop table if exists test_table;
CREATE TABLE test_table (a int);
create or replace function doit(r int, a int) returns bool
language plpgsql as $$
begin
raise notice 'r = %, a = %', r, a;
if (r = 10) then
CREATE RULE make_noise AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 2;
raise notice 'made rule';
end if;
if (r = 20 and a = 1) then
CREATE RULE make_noise_2 AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 3;
raise notice 'made rule 2';
end if;
return true;
end$$;
set plan_cache_mode to force_generic_plan;
DO $$
BEGIN
FOR r IN 1..30 LOOP
TRUNCATE test_table;
INSERT INTO test_table SELECT 1;
DELETE FROM test_table where doit(r,a);
END LOOP;
END$$;
table test_table;
Attachments:
[text/x-diff] v2-0010-Partially-fix-some-extremely-broken-code-from-52.patch (3.7K, 2-v2-0010-Partially-fix-some-extremely-broken-code-from-52.patch)
download | inline diff:
From a680e6b6885378beb0164e465b50afd81558ebc5 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Mon, 19 May 2025 00:02:20 -0400
Subject: [PATCH v2 10/20] Partially fix some extremely broken code from
525392d57.
Avoid leaking memory in the stmt_context during BuildCachedPlan.
Sadly, this code has problems a lot worse than that (per the
documentation I added), so I suspect 525392d57 will get reverted
and we won't need this patch.
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/backend/utils/cache/plancache.c | 37 ++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 9bcbc4c3e97..40ba3e9df7c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1109,22 +1109,32 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
*/
if (!plansource->is_oneshot)
{
+ List *stmt_plist;
+
plan_context = AllocSetContextCreate(CurrentMemoryContext,
"CachedPlan",
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+ stmt_context = AllocSetContextCreate(plan_context,
"CachedPlan PlannedStmts",
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
- MemoryContextSetParent(stmt_context, plan_context);
+ /*
+ * Copy plans into the stmt_context.
+ */
MemoryContextSwitchTo(stmt_context);
- plist = copyObject(plist);
+ stmt_plist = copyObject(plist);
+ /*
+ * We actually need the top-level List object to be in the long-lived
+ * plan_context, in case UpdateCachedPlan wants to update it; see
+ * comments therein. Do a shallow copy to make that happen.
+ */
MemoryContextSwitchTo(plan_context);
- plist = list_copy(plist);
+ plist = list_copy(stmt_plist);
+ list_free(stmt_plist); /* be tidy */
}
else
plan_context = CurrentMemoryContext;
@@ -1251,12 +1261,22 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
/*
* Planning work is done in the caller's memory context. The resulting
- * PlannedStmt is then copied into plan->stmt_context after throwing away
- * the old ones.
+ * PlannedStmt(s) are then copied into plan->stmt_context after throwing
+ * away the old ones. But note that we re-use the long-lived
+ * plan->stmt_list list to hold the pointers to the PlannedStmts. This
+ * kluge avoids breaking code that is iterating over that list, so long as
+ * it's between statements and not currently using one of the contained
+ * PlannedStmts.
+ *
+ * XXX this is, if not actively broken, at least unbelievably fragile.
+ * Aside from the likelihood that the just-stated assumption doesn't hold
+ * universally, there is not a good reason to believe that the length of
+ * the plan list is constant.
*/
plan_list = pg_plan_queries(query_list, plansource->query_string,
plansource->cursor_options, NULL);
- Assert(list_length(plan_list) == list_length(plan->stmt_list));
+ if (list_length(plan_list) != list_length(plan->stmt_list))
+ elog(ERROR, "UpdateCachedPlan(): plan list length changed");
MemoryContextReset(plan->stmt_context);
oldcxt = MemoryContextSwitchTo(plan->stmt_context);
@@ -1276,7 +1296,8 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
/*
* We've updated all the plans that might have been invalidated, so mark
- * the CachedPlan as valid.
+ * the CachedPlan as valid. XXX wrong: we could already have hit a new
+ * invalidation event.
*/
plan->is_valid = true;
--
2.43.5
[text/plain] break_cached_plan.sql (778B, 3-break_cached_plan.sql)
download | inline:
drop table if exists test_table;
CREATE TABLE test_table (a int);
create or replace function doit(r int, a int) returns bool
language plpgsql as $$
begin
raise notice 'r = %, a = %', r, a;
if (r = 10) then
CREATE RULE make_noise AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 2;
raise notice 'made rule';
end if;
if (r = 20 and a = 1) then
CREATE RULE make_noise_2 AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 3;
raise notice 'made rule 2';
end if;
return true;
end$$;
set plan_cache_mode to force_generic_plan;
DO $$
BEGIN
FOR r IN 1..30 LOOP
TRUNCATE test_table;
INSERT INTO test_table SELECT 1;
DELETE FROM test_table where doit(r,a);
END LOOP;
END$$;
table test_table;
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
@ 2025-05-20 07:59 ` Tomas Vondra <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Tomas Vondra @ 2025-05-20 07:59 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Amit Langote <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On 5/20/25 05:06, Tom Lane wrote:
> Amit Langote <[email protected]> writes:
>> Pushed after some tweaks to comments and the test case.
>
> My attention was drawn to commit 525392d57 after observing that
> Valgrind complained about a memory leak in some code that commit added
> to BuildCachedPlan(). I tried to make sense of said code so I could
> remove the leak, and eventually arrived at the attached patch, which
> is part of a series of leak-fixing things hence the high sequence
> number.
>
> Unfortunately, the bad things I speculated about in the added comments
> seem to be reality. The second attached file is a test case that
> triggers
>
> ...
FYI I added this as a PG18 open item:
https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 07:59 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
@ 2025-05-21 10:22 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2025-05-21 10:22 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, May 21, 2025 at 3:44 AM Tomas Vondra <[email protected]> wrote:
> On 5/20/25 05:06, Tom Lane wrote:
> > Amit Langote <[email protected]> writes:
> >> Pushed after some tweaks to comments and the test case.
> >
> > My attention was drawn to commit 525392d57 after observing that
> > Valgrind complained about a memory leak in some code that commit added
> > to BuildCachedPlan(). I tried to make sense of said code so I could
> > remove the leak, and eventually arrived at the attached patch, which
> > is part of a series of leak-fixing things hence the high sequence
> > number.
> >
> > Unfortunately, the bad things I speculated about in the added comments
> > seem to be reality. The second attached file is a test case that
> > triggers
> >
> > ...
>
> FYI I added this as a PG18 open item:
>
> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
Thanks Tomas.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
@ 2025-05-20 13:25 ` Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-05-20 13:25 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi Tom,
On Tue, May 20, 2025 at 12:06 PM Tom Lane <[email protected]> wrote:
> My attention was drawn to commit 525392d57 after observing that
> Valgrind complained about a memory leak in some code that commit added
> to BuildCachedPlan(). I tried to make sense of said code so I could
> remove the leak, and eventually arrived at the attached patch, which
> is part of a series of leak-fixing things hence the high sequence
> number.
>
> Unfortunately, the bad things I speculated about in the added comments
> seem to be reality. The second attached file is a test case that
> triggers
>
> TRAP: failed Assert("list_length(plan_list) == list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087
>
> because it adds a DO ALSO rule that causes the rewriter to generate
> more PlannedStmts than it did before.
>
> This is quite awful, because it does more than simply break the klugy
> (and undocumented) business about keeping the top-level List in a
> different context. What it means is that any outside code that is
> busy iterating that List is very fundamentally broken: it's not clear
> what List index it ought to resume at, except that "the one it was at"
> is demonstrably incorrect.
>
> I also don't really believe the (also undocumented) assumption that
> such outside code is in between executions of PlannedStmts of the
> List and hence can tolerate those being ripped out and replaced.
> I have not attempted to build an example, because the one I have
> seems sufficiently damning. But I bet that a recursive function
> could be constructed in such a way that an outer execution is
> still in progress when an inner call triggers UpdateCachedPlan.
>
> Another small problem (much more easily fixable than the above,
> probably) is that summarily setting "plan->is_valid = true"
> at the end is not okay. We could already have received an
> invalidation that should result in marking the plan stale.
> (Holding locks on the tables involved is not sufficient to
> prevent that, as there are other sources of inval events.)
Thanks for pointing out the hole in the current handling of
CachedPlan->stmt_list. You're right that the approach of preserving
the list structure while replacing its contents in-place doesn’t hold
up when the rewriter adds or removes statements dynamically. There
might be other cases that neither of us have tried. I don’t think
that mechanism is salvageable.
To address the issue without needing a full revert, I’m considering
dropping UpdateCachedPlan() and removing the associated MemoryContext
dance to preserve CachedPlan->stmt_list structure. Instead, the
executor would replan the necessary query into a transient list of
PlannedStmts, leaving the original CachedPlan untouched. That avoids
mutating shared plan state during execution and still enables deferred
locking in the vast majority of cases.
There are two variants of this approach. In the simpler form, the
transient PlannedStmt list exists only in executor-local memory and
isn’t registered with the invalidation machinery. That might be
acceptable in practice, since all referenced relations are locked at
that point -- but it would mean any invalidation events delivered
during execution are ignored. The more robust variant is to build a
one-query standalone CachedPlan using something like
GetTransientCachedPlanForQuery(), which I had proposed back in [1].
This gets added to a standalone_plan_list so that invalidation
callbacks can still reach it. I dropped that design earlier [2] due to
the cleanup overhead, but I’d be happy to bring it back in a
simplified form if that seems preferable.
One open question in either case is what to do if the number of
PlannedStmts in the rewritten plan changes as with your example. Would
it be reasonable to just go ahead and execute the additional
statements from the transient plan, even though the original
CachedPlan wouldn’t have known about them until the next use? That
would avoid introducing any new failure behavior while still handling
the invalidation correctly for the current execution.
> It's possible that this code can be fixed, but I fear it's
> going to involve some really fundamental redesign, which
> probably shouldn't be happening after beta1. I think there
> is no alternative but to revert for v18.
...Beyond that, I think I’ve run out of clean options for making
deferred locking executor-local while keeping invalidation safe. I
know you'd previously objected (with good reason) to making
GetCachedPlan() itself run pruning logic to determine which partitions
to lock -- and to the idea of carrying or sharing the result of that
pruning back to the executor via interface changes in the path from
plancache.c through its callers down to ExecutorStart(). So I’ve
steered away from revisiting that direction. But if we’re not
comfortable with either of the transient replanning options, then we
may end up shelving the deferred locking idea entirely -- which would
be unfortunate, given how much it helps workloads that rely on generic
plans over large partitioned tables.
Let me know what you think -- I’ll hold off on posting a revert or a
replacement until we’ve agreed on the path forward.
--
Thanks, Amit Langote
[1] https://www.postgresql.org/message-id/CA%2BHiwqGSOge3eT3kcm_nxCSA3Ut%2Bd0jtchi8g8J9uXi-kyC7Jw%40mail...
[2] https://www.postgresql.org/message-id/CA%2BHiwqHRRFQN6yZ54fBydOTM6ncqZBCmewZ6n519RjRdDsO44g%40mail.g...
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-05-20 15:38 ` Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 13:50 ` Re: generic plans and "initial" pruning Robert Haas <[email protected]>
0 siblings, 2 replies; 66+ messages in thread
From: Tom Lane @ 2025-05-20 15:38 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> writes:
> Thanks for pointing out the hole in the current handling of
> CachedPlan->stmt_list. You're right that the approach of preserving
> the list structure while replacing its contents in-place doesn’t hold
> up when the rewriter adds or removes statements dynamically. There
> might be other cases that neither of us have tried. I don’t think
> that mechanism is salvageable.
> To address the issue without needing a full revert, I’m considering
> dropping UpdateCachedPlan() and removing the associated MemoryContext
> dance to preserve CachedPlan->stmt_list structure. Instead, the
> executor would replan the necessary query into a transient list of
> PlannedStmts, leaving the original CachedPlan untouched. That avoids
> mutating shared plan state during execution and still enables deferred
> locking in the vast majority of cases.
Yeah, I think messing with the CachedPlan is just fundamentally wrong.
It breaks the invariant that the executor should not scribble on what
it's handed --- maybe not as obviously as some other cases, but it's
still not a good design.
I kind of feel that we ought to take two steps back and think
about what it even means to have a generic plan in this situation.
Perhaps we should simply refuse to use that code path if there are
prunable partitioned tables involved?
> Let me know what you think -- I’ll hold off on posting a revert or a
> replacement until we’ve agreed on the path forward.
I had not looked at 525392d57 in any detail before (the claim in
the commit message that I reviewed it is a figment of someone's
imagination). Now that I have, I'm still going to argue for revert.
Aside from the points above, I really hate what's been done to the
fundamental executor APIs. The fact that ExecutorStart callers have
to know about this is as ugly as can be. I also don't like the
fact that it's added overhead in cases where there can be no benefit
(notice that my test case doesn't even involve a partitioned table).
I still like the core idea of deferring locking, but I don't like
anything about this implementation of it. It seems like there has
to be a better and simpler way.
regards, tom lane
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
@ 2025-05-21 10:22 ` Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-05-21 10:22 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, May 21, 2025 at 12:38 AM Tom Lane <[email protected]> wrote:
> Amit Langote <[email protected]> writes:
> > Thanks for pointing out the hole in the current handling of
> > CachedPlan->stmt_list. You're right that the approach of preserving
> > the list structure while replacing its contents in-place doesn’t hold
> > up when the rewriter adds or removes statements dynamically. There
> > might be other cases that neither of us have tried. I don’t think
> > that mechanism is salvageable.
>
> > To address the issue without needing a full revert, I’m considering
> > dropping UpdateCachedPlan() and removing the associated MemoryContext
> > dance to preserve CachedPlan->stmt_list structure. Instead, the
> > executor would replan the necessary query into a transient list of
> > PlannedStmts, leaving the original CachedPlan untouched. That avoids
> > mutating shared plan state during execution and still enables deferred
> > locking in the vast majority of cases.
>
> Yeah, I think messing with the CachedPlan is just fundamentally wrong.
> It breaks the invariant that the executor should not scribble on what
> it's handed --- maybe not as obviously as some other cases, but it's
> still not a good design.
Fair enough. I’ll revert this and some related changes shortly. WIP
patch attached.
> I kind of feel that we ought to take two steps back and think
> about what it even means to have a generic plan in this situation.
> Perhaps we should simply refuse to use that code path if there are
> prunable partitioned tables involved?
Sorry, I’m not sure I fully understand -- especially what you mean by
“that code path.” If you're referring to the generic plan creation and
reuse path in general, I'd point out that initial runtime pruning was
introduced largely to improve the efficiency of generic plan execution
(albeit without addressing the locking bottleneck at the time -- David
Rowley had explored that earlier). So simply disallowing generic plans
when partitions are involved feels like an odd direction, given that a
major motivation for initial pruning was to make those cases faster.
Custom plans can win when parameters are available, of course, but
there's a major use case involving stable expressions like now() with
time-based partitions, where plan_cache_mode = auto will still choose
a generic plan. So I wouldn’t say that optimizing generic plan
execution -- especially the goal of this project -- is wasted effort
in practice.
> > Let me know what you think -- I’ll hold off on posting a revert or a
> > replacement until we’ve agreed on the path forward.
>
> I had not looked at 525392d57 in any detail before (the claim in
> the commit message that I reviewed it is a figment of someone's
> imagination).
Apologies if I gave the misleading impression that you were on board
with the current design. I meant only to acknowledge your earlier
engagement with the general idea, which I appreciated. I marked it as
“(old versions)” in the commit metadata to reflect that -- clearly I
should’ve been more precise. I know that the meaning of Reviewed-by
and other tags is evolving and I clearly haven't kept up.
> Now that I have, I'm still going to argue for revert.
> Aside from the points above, I really hate what's been done to the
> fundamental executor APIs. The fact that ExecutorStart callers have
> to know about this is as ugly as can be. I also don't like the
> fact that it's added overhead in cases where there can be no benefit
> (notice that my test case doesn't even involve a partitioned table).
I tried to keep the overhead low by ensuring that the only additional
thing we'd be doing in the regular path is a CachedPlan->is_valid
boolean check in a couple of places, and that further work would only
happen if invalidation actually occurred. That said, I realize the
patch makes invalidation handling apply in more cases than before,
which may itself be seen as added overhead. But I may have
misunderstood your concern -- perhaps it's more about the layering
violation than the raw cycles?
> I still like the core idea of deferring locking, but I don't like
> anything about this implementation of it. It seems like there has
> to be a better and simpler way.
It's good to hear that you still like the core idea -- I’d really
appreciate it if you're willing to continue bearing with me as I try
to rework this in a way that's cleaner and better aligned with the
overall design. I'd welcome any thoughts you have along the way. I
know this has been a difficult project, and I don't mean to come
across as taking any of it lightly. I'm still hopeful there's a path
forward, but I completely understand the need to reset here.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v1-0001-Revert-Don-t-lock-partitions-pruned-by-initial-pr.patch (66.5K, 2-v1-0001-Revert-Don-t-lock-partitions-pruned-by-initial-pr.patch)
download | inline diff:
From 260d3fbf4801402f1a2ffd947f1f05fd3cad6878 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 21 May 2025 18:46:52 +0900
Subject: [PATCH v1] Revert "Don't lock partitions pruned by initial pruning"
As pointed out by Tom Lane, the patch introduced fragile and invasive
design around plan invalidation handling when locking of prunable
partitions was deferred from plancache.c to the executor. In
particular, it violated assumptions about CachedPlan immutability and
altered executor APIs in ways that are difficult to justify given the
added complexity and overhead.
This also removes the firstResultRels field added to PlannedStmt in
commit 28317de72, which was intended to support deferred locking of
certain ModifyTable result relations.
Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
contrib/auto_explain/auto_explain.c | 16 +-
.../pg_stat_statements/pg_stat_statements.c | 16 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 5 +-
src/backend/commands/explain.c | 22 +-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 5 +-
src/backend/commands/portalcmds.c | 1 -
src/backend/commands/prepare.c | 9 +-
src/backend/commands/trigger.c | 15 --
src/backend/executor/README | 35 +---
src/backend/executor/execMain.c | 127 +----------
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 67 +-----
src/backend/executor/execUtils.c | 1 -
src/backend/executor/functions.c | 4 +-
src/backend/executor/spi.c | 29 +--
src/backend/optimizer/plan/planner.c | 2 -
src/backend/optimizer/plan/setrefs.c | 3 -
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 51 +----
src/backend/utils/cache/plancache.c | 197 +++---------------
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 6 +-
src/include/commands/trigger.h | 1 -
src/include/executor/execdesc.h | 2 -
src/include/executor/executor.h | 33 +--
src/include/nodes/execnodes.h | 3 -
src/include/nodes/pathnodes.h | 3 -
src/include/nodes/plannodes.h | 7 -
src/include/utils/plancache.h | 46 +---
src/include/utils/portal.h | 4 +-
32 files changed, 88 insertions(+), 651 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index cd6625020a7..1f4badb4928 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -81,7 +81,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -261,11 +261,9 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static bool
+static void
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- bool plan_valid;
-
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -301,13 +299,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- plan_valid = prev_ExecutorStart(queryDesc, eflags);
+ prev_ExecutorStart(queryDesc, eflags);
else
- plan_valid = standard_ExecutorStart(queryDesc, eflags);
-
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!plan_valid)
- return false;
+ standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -325,8 +319,6 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
-
- return true;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..d8fdf42df79 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -335,7 +335,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -989,19 +989,13 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static bool
+static void
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- bool plan_valid;
-
if (prev_ExecutorStart)
- plan_valid = prev_ExecutorStart(queryDesc, eflags);
+ prev_ExecutorStart(queryDesc, eflags);
else
- plan_valid = standard_ExecutorStart(queryDesc, eflags);
-
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!plan_valid)
- return false;
+ standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1024,8 +1018,6 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
-
- return true;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f87e405351d..ea6f18f2c80 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -835,7 +835,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -845,8 +845,7 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- if (!ExecutorStart(cstate->queryDesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0a4155773eb..dfd2ab8e862 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,13 +334,12 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- if (!ExecutorStart(queryDesc, GetIntoRelEFlags(into)))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 786ee865f14..09ea30dfb92 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -369,8 +369,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
- queryEnv,
+ ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,9 +491,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
- CachedPlanSource *plansource, int query_index,
- IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -550,7 +547,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -564,17 +561,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* Prepare the plan for execution. */
- if (queryDesc->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ /* call ExecutorStart to prepare the plan for execution */
+ ExecutorStart(queryDesc, eflags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 73c52e970f6..e6f9ab6dfd6 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,13 +993,11 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
- NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- if (!ExecutorStart(qdesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index e7854add178..27c2cb26ef5 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,13 +438,12 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, NULL, queryString,
+ queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- if (!ExecutorStart(queryDesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4c2ac045224..e7c8171c102 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -117,7 +117,6 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
- NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index bf7d2b2309f..34b6410d6a2 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,8 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan,
- entry->plansource);
+ cplan);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -586,7 +585,6 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
- int query_index = 0;
if (es->memory)
{
@@ -659,8 +657,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
- into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -671,8 +668,6 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
-
- query_index++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index c9f61130c69..67f8e70f9c1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5057,21 +5057,6 @@ AfterTriggerBeginQuery(void)
}
-/* ----------
- * AfterTriggerAbortQuery()
- *
- * Called by standard_ExecutorEnd() if the query execution was aborted due to
- * the plan becoming invalid during initialization.
- * ----------
- */
-void
-AfterTriggerAbortQuery(void)
-{
- /* Revert the actions of AfterTriggerBeginQuery(). */
- afterTriggers.query_depth--;
-}
-
-
/* ----------
* AfterTriggerEndQuery()
*
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 02745c23ed9..54f4782f31b 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -285,28 +285,6 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
-Relation Locking
-----------------
-
-When the executor initializes a plan tree for execution, it doesn't lock
-non-index relations if the plan tree is freshly generated and not derived
-from a CachedPlan. This is because such locks have already been established
-during the query's parsing, rewriting, and planning phases. However, with a
-cached plan tree, some relations may remain unlocked. The function
-AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
-the locking of prunable ones to executor initialization. This avoids
-unnecessary locking of relations that will be pruned during "initial" runtime
-pruning in ExecDoInitialPruning().
-
-This approach creates a window where a cached plan tree with child tables
-could become outdated if another backend modifies these tables before
-ExecDoInitialPruning() locks them. As a result, the executor has the added duty
-to verify the plan tree's validity whenever it locks a child table after
-doing initial pruning. This validation is done by checking the CachedPlan.is_valid
-flag. If the plan tree is outdated (is_valid = false), the executor stops
-further initialization, cleans up anything in EState that would have been
-allocated up to that point, and retries execution after recreating the
-invalid plan in the CachedPlan. See ExecutorStartCachedPlan().
Query Processing Control Flow
-----------------------------
@@ -315,13 +293,11 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart or ExecutorStartCachedPlan
+ ExecutorStart
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecDoInitialPruning and ExecInitNode
+ switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
- ExecDoInitialPruning
- does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -345,12 +321,7 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-As mentioned in the "Relation Locking" section, if the plan tree is found to
-be stale after locking partitions in ExecDoInitialPruning(), the control is
-immediately returned to ExecutorStartCachedPlan(), which will create a new plan
-tree and perform the steps starting from CreateExecutorState() again.
-
-Per above comments, it's not really critical for ExecEndPlan to free any
+Per above comments, it's not really critical for ExecEndNode to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7230f968101..0391798dd2c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,13 +55,11 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
-#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
-#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -119,16 +117,11 @@ static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
- * Return value indicates if the plan has been initialized successfully so
- * that queryDesc->planstate contains a valid PlanState tree. It may not
- * if the plan got invalidated during InitPlan().
* ----------------------------------------------------------------
*/
-bool
+void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- bool plan_valid;
-
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -140,14 +133,12 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- plan_valid = (*ExecutorStart_hook) (queryDesc, eflags);
+ (*ExecutorStart_hook) (queryDesc, eflags);
else
- plan_valid = standard_ExecutorStart(queryDesc, eflags);
-
- return plan_valid;
+ standard_ExecutorStart(queryDesc, eflags);
}
-bool
+void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
EState *estate;
@@ -271,64 +262,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
-
- return ExecPlanStillValid(queryDesc->estate);
-}
-
-/*
- * ExecutorStartCachedPlan
- * Start execution for a given query in the CachedPlanSource, replanning
- * if the plan is invalidated due to deferred locks taken during the
- * plan's initialization
- *
- * This function handles cases where the CachedPlan given in queryDesc->cplan
- * might become invalid during the initialization of the plan given in
- * queryDesc->plannedstmt, particularly when prunable relations in it are
- * locked after performing initial pruning. If the locks invalidate the plan,
- * the function calls UpdateCachedPlan() to replan all queries in the
- * CachedPlan, and then retries initialization.
- *
- * The function repeats the process until ExecutorStart() successfully
- * initializes the plan, that is without the CachedPlan becoming invalid.
- */
-void
-ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
- CachedPlanSource *plansource,
- int query_index)
-{
- if (unlikely(queryDesc->cplan == NULL))
- elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
- if (unlikely(plansource == NULL))
- elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlanSource");
-
- /*
- * Loop and retry with an updated plan until no further invalidation
- * occurs.
- */
- while (1)
- {
- if (!ExecutorStart(queryDesc, eflags))
- {
- /*
- * Clean up the current execution state before creating the new
- * plan to retry ExecutorStart(). Mark execution as aborted to
- * ensure that AFTER trigger state is properly reset.
- */
- queryDesc->estate->es_aborted = true;
- ExecutorEnd(queryDesc);
-
- /* Retry ExecutorStart() with an updated plan tree. */
- queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
- queryDesc->queryEnv);
- }
- else
-
- /*
- * Exit the loop if the plan is initialized successfully and no
- * sinval messages were received that invalidated the CachedPlan.
- */
- break;
- }
}
/* ----------------------------------------------------------------
@@ -387,7 +320,6 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
- Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -494,11 +426,8 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /*
- * This should be run once and only once per Executor instance and never
- * if the execution was aborted.
- */
- Assert(!estate->es_finished && !estate->es_aborted);
+ /* This should be run once and only once per Executor instance */
+ Assert(!estate->es_finished);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -561,10 +490,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
(PgStat_Counter) estate->es_parallel_workers_launched);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
- * execution was aborted.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
+ * Assert is needed because ExecutorFinish is new as of 9.1, and callers
+ * might forget to call it.
*/
- Assert(estate->es_finished || estate->es_aborted ||
+ Assert(estate->es_finished ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -578,14 +508,6 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
- /*
- * Reset AFTER trigger module if the query execution was aborted.
- */
- if (estate->es_aborted &&
- !(estate->es_top_eflags &
- (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
- AfterTriggerAbortQuery();
-
/*
* Must switch out of context before destroying it
*/
@@ -684,21 +606,6 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
- /*
- * Ensure that we have at least an AccessShareLock on relations
- * whose permissions need to be checked.
- *
- * Skip this check in a parallel worker because locks won't be
- * taken until ExecInitNode() performs plan initialization.
- *
- * XXX: ExecCheckPermissions() in a parallel worker may be
- * redundant with the checks done in the leader process, so this
- * should be reviewed to ensure it’s necessary.
- */
- Assert(IsParallelWorker() ||
- CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
- true));
-
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -924,12 +831,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
- *
- * If the plan originates from a CachedPlan (given in queryDesc->cplan),
- * it can become invalid during runtime "initial" pruning when the
- * remaining set of locks is taken. The function returns early in that
- * case without initializing the plan, and the caller is expected to
- * retry with a new valid plan.
* ----------------------------------------------------------------
*/
static void
@@ -937,7 +838,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
- CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -958,7 +858,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = cachedplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -972,9 +871,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
*/
ExecDoInitialPruning(estate);
- if (!ExecPlanStillValid(estate))
- return;
-
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -3092,9 +2988,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
- *
- * es_cachedplan is not copied because EPQ plan execution does not acquire
- * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 39c990ae638..f3e77bda279 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1278,15 +1278,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /*
- * Create a QueryDesc for the query. We pass NULL for cachedplan, because
- * we don't have a pointer to the CachedPlan in the leader's process. It's
- * fine because the only reason the executor needs to see it is to decide
- * if it should take locks on certain relations, but parallel workers
- * always take locks anyway.
- */
+ /* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1471,8 +1464,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- if (!ExecutorStart(queryDesc, fpes->eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 3f8a4cb5244..3299db22bd5 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,7 +26,6 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
-#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -1771,8 +1770,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * all plan nodes that contain a PartitionPruneInfo. This also locks the
- * leaf partitions whose subnodes will be initialized if needed.
+ * all plan nodes that contain a PartitionPruneInfo.
*
* ExecInitPartitionExecPruning:
* Updates the PartitionPruneState found at given part_prune_index in
@@ -1793,13 +1791,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
-
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * plan nodes that support partition pruning. This also locks the leaf
- * partitions whose subnodes will be initialized if needed.
+ * plan nodes that support partition pruning.
*
* This function iterates over each PartitionPruneInfo entry in
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
@@ -1821,9 +1817,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
void
ExecDoInitialPruning(EState *estate)
{
- PlannedStmt *stmt = estate->es_plannedstmt;
ListCell *lc;
- List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
@@ -1849,68 +1843,11 @@ ExecDoInitialPruning(EState *estate)
else
validsubplan_rtis = all_leafpart_rtis;
- if (ExecShouldLockRelations(estate))
- {
- int rtindex = -1;
-
- while ((rtindex = bms_next_member(validsubplan_rtis,
- rtindex)) >= 0)
- {
- RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
-
- Assert(rte->rtekind == RTE_RELATION &&
- rte->rellockmode != NoLock);
- LockRelationOid(rte->relid, rte->rellockmode);
- locked_relids = lappend_int(locked_relids, rtindex);
- }
- }
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
-
- /*
- * Lock the first result relation of each ModifyTable node, even if it was
- * pruned. This is required for ExecInitModifyTable(), which keeps its
- * first result relation if all other result relations have been pruned,
- * because some executor paths (e.g., in nodeModifyTable.c and
- * execPartition.c) rely on there being at least one result relation.
- *
- * There's room for improvement here --- we actually only need to do this
- * if all other result relations of the ModifyTable node were pruned, but
- * we don't have an easy way to tell that here.
- */
- if (stmt->resultRelations && ExecShouldLockRelations(estate))
- {
- foreach(lc, stmt->firstResultRels)
- {
- Index firstResultRel = lfirst_int(lc);
-
- if (!bms_is_member(firstResultRel, estate->es_unpruned_relids))
- {
- RangeTblEntry *rte = exec_rt_fetch(firstResultRel, estate);
-
- Assert(rte->rtekind == RTE_RELATION && rte->rellockmode != NoLock);
- LockRelationOid(rte->relid, rte->rellockmode);
- locked_relids = lappend_int(locked_relids, firstResultRel);
- }
- }
- }
-
- /*
- * Release the useless locks if the plan won't be executed. This is the
- * same as what CheckCachedPlan() in plancache.c does.
- */
- if (!ExecPlanStillValid(estate))
- {
- foreach(lc, locked_relids)
- {
- RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
-
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
}
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 772c86e70e9..fdc65c2b42b 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,7 +147,6 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
- estate->es_aborted = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 8d4d062d579..b1f9c17f98a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1338,7 +1338,6 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
- NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1363,8 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- if (!ExecutorStart(es->qd, eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3288396def3..ecb2e4ccaa1 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,8 +70,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
- CachedPlanSource *plansource, int query_index);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1686,8 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan,
- plansource);
+ cplan);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2502,7 +2500,6 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
- int query_index = 0;
spicallbackarg.query = plansource->query_string;
@@ -2693,16 +2690,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
- cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
-
- res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
- plansource, query_index);
+ res = _SPI_pquery(qdesc, fire_triggers,
+ canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2799,8 +2794,6 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
-
- query_index++;
}
/* Done with this plan, so release refcount */
@@ -2878,8 +2871,7 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
- CachedPlanSource *plansource, int query_index)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
{
int operation = queryDesc->operation;
int eflags;
@@ -2935,16 +2927,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- if (queryDesc->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ ExecutorStart(queryDesc, eflags);
ExecutorRun(queryDesc, ForwardScanDirection, tcount);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 49ad6e83578..ff65867eebe 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -331,7 +331,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->finalrteperminfos = NIL;
glob->finalrowmarks = NIL;
glob->resultRelations = NIL;
- glob->firstResultRels = NIL;
glob->appendRelations = NIL;
glob->partPruneInfos = NIL;
glob->relationOids = NIL;
@@ -571,7 +570,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
- result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 150e9f060ee..999a5a8ab5a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1248,9 +1248,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
- root->glob->firstResultRels =
- lappend_int(root->glob->firstResultRels,
- linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1ae51b1b391..92ddeba78fd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1226,7 +1226,6 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
- NULL,
NULL);
/*
@@ -2028,8 +2027,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan,
- psrc);
+ cplan);
/* Portal is defined, set the plan ID based on its contents. */
foreach(lc, portal->stmts)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 8164d0fbb4f..d1593f38b35 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,7 +19,6 @@
#include "access/xact.h"
#include "commands/prepare.h"
-#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
@@ -38,9 +37,6 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
- CachedPlan *cplan,
- CachedPlanSource *plansource,
- int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -70,7 +66,6 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
- CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -83,7 +78,6 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
- qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -129,9 +123,6 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
- * cplan: CachedPlan supplying the plan
- * plansource: CachedPlanSource supplying the cplan
- * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -144,9 +135,6 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
- CachedPlan *cplan,
- CachedPlanSource *plansource,
- int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -158,23 +146,14 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, cplan, sourceText,
+ queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/*
- * Prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution
*/
- if (queryDesc->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, 0, plansource, query_index);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ ExecutorStart(queryDesc, 0);
/*
* Run the plan to completion.
@@ -515,7 +494,6 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
- portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -535,19 +513,9 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Prepare the plan for execution.
+ * Call ExecutorStart to prepare the plan for execution
*/
- if (portal->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, myeflags,
- portal->plansource, 0);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, myeflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ ExecutorStart(queryDesc, myeflags);
/*
* This tells PortalCleanup to shut down the executor
@@ -1221,7 +1189,6 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
- int query_index = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1303,9 +1270,6 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
- portal->cplan,
- portal->plansource,
- query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1315,9 +1279,6 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
- portal->cplan,
- portal->plansource,
- query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1382,8 +1343,6 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
-
- query_index++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 9bcbc4c3e97..89a1c79e984 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -92,8 +92,7 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv,
- bool release_generic);
+ QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -663,17 +662,10 @@ BuildingPlanRequiresSnapshot(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
- *
- * This also releases and drops the generic plan (plansource->gplan), if any,
- * as most callers will typically build a new CachedPlan for the plansource
- * right after this. However, when called from UpdateCachedPlan(), the
- * function does not release the generic plan, as UpdateCachedPlan() updates
- * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv,
- bool release_generic)
+ QueryEnvironment *queryEnv)
{
bool snapshot_set;
List *tlist; /* transient query-tree list */
@@ -772,9 +764,8 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference, if any, and if requested */
- if (release_generic)
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference if any */
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -937,10 +928,8 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired locks on the "unprunableRelids" set
- * for all plans in plansource->stmt_list. However, the plans are not fully
- * race-condition-free until the executor acquires locks on the prunable
- * relations that survive initial runtime pruning during InitPlan().
+ * On a "true" return, we have acquired the locks needed to run the plan.
+ * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -1025,8 +1014,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
- *
- * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -1037,7 +1024,6 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
- MemoryContext stmt_context = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -1055,7 +1041,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv, true);
+ qlist = RevalidateCachedQuery(plansource, queryEnv);
/*
* If we don't already have a copy of the querytree list that can be
@@ -1093,19 +1079,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
PopActiveSnapshot();
/*
- * Normally, we create a dedicated memory context for the CachedPlan and
- * its subsidiary data. Although it's usually not very large, the context
- * is designed to allow growth if necessary.
- *
- * The PlannedStmts are stored in a separate child context (stmt_context)
- * of the CachedPlan's memory context. This separation allows
- * UpdateCachedPlan() to free and replace the PlannedStmts without
- * affecting the CachedPlan structure or its stmt_list List.
- *
- * For one-shot plans, we instead use the caller's memory context, as the
- * CachedPlan will not persist. stmt_context will be set to NULL in this
- * case, because UpdateCachedPlan() should never get called on a one-shot
- * plan.
+ * Normally we make a dedicated memory context for the CachedPlan and its
+ * subsidiary data. (It's probably not going to be large, but just in
+ * case, allow it to grow large. It's transient for the moment.) But for
+ * a one-shot plan, we just leave it in the caller's memory context.
*/
if (!plansource->is_oneshot)
{
@@ -1114,17 +1091,12 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- stmt_context = AllocSetContextCreate(CurrentMemoryContext,
- "CachedPlan PlannedStmts",
- ALLOCSET_START_SMALL_SIZES);
- MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
- MemoryContextSetParent(stmt_context, plan_context);
+ /*
+ * Copy plan into the new context.
+ */
+ MemoryContextSwitchTo(plan_context);
- MemoryContextSwitchTo(stmt_context);
plist = copyObject(plist);
-
- MemoryContextSwitchTo(plan_context);
- plist = list_copy(plist);
}
else
plan_context = CurrentMemoryContext;
@@ -1165,10 +1137,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->saved_xmin = InvalidTransactionId;
plan->refcount = 0;
plan->context = plan_context;
- plan->stmt_context = stmt_context;
plan->is_oneshot = plansource->is_oneshot;
plan->is_saved = false;
- plan->is_reused = false;
plan->is_valid = true;
/* assign generation number to new plan */
@@ -1179,113 +1149,6 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
return plan;
}
-/*
- * UpdateCachedPlan
- * Create fresh plans for all queries in the CachedPlanSource, replacing
- * those in the generic plan's stmt_list, and return the plan for the
- * query_index'th query.
- *
- * This function is primarily used by ExecutorStartCachedPlan() to handle
- * cases where the original generic CachedPlan becomes invalid. Such
- * invalidation may occur when prunable relations in the old plan for the
- * query_index'th query are locked in preparation for execution.
- *
- * Note that invalidations received during the execution of the query_index'th
- * query can affect both the queries that have already finished execution
- * (e.g., due to concurrent modifications on prunable relations that were not
- * locked during their execution) and also the queries that have not yet been
- * executed. As a result, this function updates all plans to ensure
- * CachedPlan.is_valid is safely set to true.
- *
- * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
- * the caller and any of its callers must not rely on them remaining accessible
- * after this function is called.
- */
-PlannedStmt *
-UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
- QueryEnvironment *queryEnv)
-{
- List *query_list = plansource->query_list,
- *plan_list;
- ListCell *l1,
- *l2;
- CachedPlan *plan = plansource->gplan;
- MemoryContext oldcxt;
-
- Assert(ActiveSnapshotSet());
-
- /* Sanity checks (XXX can be Asserts?) */
- if (plan == NULL)
- elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
- else if (plan->is_valid)
- elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
- else if (plan->is_oneshot)
- elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
-
- /*
- * The plansource might have become invalid since GetCachedPlan() returned
- * the CachedPlan. See the comment in BuildCachedPlan() for details on why
- * this might happen. Although invalidation is likely a false positive as
- * stated there, we make the plan valid to ensure the query list used for
- * planning is up to date.
- *
- * The risk of catching an invalidation is higher here than when
- * BuildCachedPlan() is called from GetCachedPlan(), because this function
- * is normally called long after GetCachedPlan() returns the CachedPlan,
- * so much more processing could have occurred including things that mark
- * the CachedPlanSource invalid.
- *
- * Note: Do not release plansource->gplan, because the upstream callers
- * (such as the callers of ExecutorStartCachedPlan()) would still be
- * referencing it.
- */
- if (!plansource->is_valid)
- query_list = RevalidateCachedQuery(plansource, queryEnv, false);
- Assert(query_list != NIL);
-
- /*
- * Build a new generic plan for all the queries after making a copy to be
- * scribbled on by the planner.
- */
- query_list = copyObject(query_list);
-
- /*
- * Planning work is done in the caller's memory context. The resulting
- * PlannedStmt is then copied into plan->stmt_context after throwing away
- * the old ones.
- */
- plan_list = pg_plan_queries(query_list, plansource->query_string,
- plansource->cursor_options, NULL);
- Assert(list_length(plan_list) == list_length(plan->stmt_list));
-
- MemoryContextReset(plan->stmt_context);
- oldcxt = MemoryContextSwitchTo(plan->stmt_context);
- forboth(l1, plan_list, l2, plan->stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst(l1);
-
- lfirst(l2) = copyObject(plannedstmt);
- }
- MemoryContextSwitchTo(oldcxt);
-
- /*
- * XXX Should this also (re)set the properties of the CachedPlan that are
- * set in BuildCachedPlan() after creating the fresh plans such as
- * planRoleId, dependsOnRole, and saved_xmin?
- */
-
- /*
- * We've updated all the plans that might have been invalidated, so mark
- * the CachedPlan as valid.
- */
- plan->is_valid = true;
-
- /* Also update generic_cost because we just created a new generic plan. */
- plansource->generic_cost = cached_plan_cost(plan, false);
-
- return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
-}
-
/*
* choose_custom_plan: choose whether to use custom or generic plan
*
@@ -1402,13 +1265,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid, but if it is a reused generic plan, not all
- * locks are acquired. In such cases, CheckCachedPlan() does not take locks
- * on relations subject to initial runtime pruning; instead, these locks are
- * deferred until execution startup, when ExecDoInitialPruning() performs
- * initial pruning. The plan's "is_reused" flag is set to indicate that
- * CachedPlanRequiresLocking() should return true when called by
- * ExecDoInitialPruning().
+ * On return, the plan is valid and we have sufficient locks to begin
+ * execution.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1434,7 +1292,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv, true);
+ qlist = RevalidateCachedQuery(plansource, queryEnv);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1446,8 +1304,6 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Reusing the existing plan, so not all locks may be acquired. */
- plan->is_reused = true;
}
else
{
@@ -1913,7 +1769,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv, true);
+ RevalidateCachedQuery(plansource, queryEnv);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -2035,7 +1891,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- int rtindex;
+ ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -2053,16 +1909,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- rtindex = -1;
- while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
- rtindex)) >= 0)
+ foreach(lc2, plannedstmt->rtable)
{
- RangeTblEntry *rte = list_nth_node(RangeTblEntry,
- plannedstmt->rtable,
- rtindex - 1);
+ RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
- Assert(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+ if (!(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
+ continue;
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index e3526e78064..0be1c2b0fff 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,8 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan,
- CachedPlanSource *plansource)
+ CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -300,7 +299,6 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
- portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 03c5b3d73e5..3b122f79ed8 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,10 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
struct ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
- CachedPlanSource *plansource, int query_index,
- IntoClause *into, struct ExplainState *es,
- const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+ struct ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 4180601dcd4..2ed2c4bb378 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,7 +258,6 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
-extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..86db3dc8d0d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,7 +35,6 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
- CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -58,7 +57,6 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
- CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ae99407db89..fbe4bf081f7 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -229,11 +229,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
- CachedPlanSource *plansource,
- int query_index);
-extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -300,30 +297,6 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
-/*
- * Is the CachedPlan in es_cachedplan still valid?
- *
- * Called from InitPlan() because invalidation messages that affect the plan
- * might be received after locks have been taken on runtime-prunable relations.
- * The caller should take appropriate action if the plan has become invalid.
- */
-static inline bool
-ExecPlanStillValid(EState *estate)
-{
- return estate->es_cachedplan == NULL ? true :
- CachedPlanValid(estate->es_cachedplan);
-}
-
-/*
- * Locks are needed only if running a cached plan that might contain unlocked
- * relations, such as a reused generic plan.
- */
-static inline bool
-ExecShouldLockRelations(EState *estate)
-{
- return estate->es_cachedplan == NULL ? false :
- CachedPlanRequiresLocking(estate->es_cachedplan);
-}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5b6cadb5a6c..2492282213f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,7 +42,6 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
-#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -664,7 +663,6 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
- CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
@@ -717,7 +715,6 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
- bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1dd2d1560cb..6567759595d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -138,9 +138,6 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
- /* "flat" list of integer RT indexes (one per ModifyTable node) */
- List *firstResultRels;
-
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 658d76225e4..f0d514e6e15 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -105,13 +105,6 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
- /*
- * rtable indexes of first target relation in each ModifyTable node in the
- * plan for INSERT/UPDATE/DELETE/MERGE
- */
- /* integer list of RT indexes, or NIL */
- List *firstResultRels;
-
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 07ec5318db7..1baa6d50bfd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,8 +18,6 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
-#include "nodes/parsenodes.h"
-#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -153,11 +151,10 @@ typedef struct CachedPlanSource
* The reference count includes both the link from the parent CachedPlanSource
* (if any), and any active plan executions, so the plan can be discarded
* exactly when refcount goes to zero. Both the struct itself and the
- * subsidiary data, except the PlannedStmts in stmt_list live in the context
- * denoted by the context field; the PlannedStmts live in the context denoted
- * by stmt_context. Separate contexts makes it easy to free a no-longer-needed
- * cached plan. (However, if is_oneshot is true, the context does not belong
- * solely to the CachedPlan so no freeing is possible.)
+ * subsidiary data live in the context denoted by the context field.
+ * This makes it easy to free a no-longer-needed cached plan. (However,
+ * if is_oneshot is true, the context does not belong solely to the CachedPlan
+ * so no freeing is possible.)
*/
typedef struct CachedPlan
{
@@ -165,7 +162,6 @@ typedef struct CachedPlan
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
- bool is_reused; /* is it a reused generic plan? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
@@ -174,10 +170,6 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
- MemoryContext stmt_context; /* context containing the PlannedStmts in
- * stmt_list, but not the List itself which is
- * in the above context; NULL if is_oneshot is
- * true. */
} CachedPlan;
/*
@@ -249,10 +241,6 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
-extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
- int query_index,
- QueryEnvironment *queryEnv);
-
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -265,30 +253,4 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
-/*
- * CachedPlanRequiresLocking: should the executor acquire additional locks?
- *
- * If the plan is a saved generic plan, the executor must acquire locks for
- * relations that are not covered by AcquireExecutorLocks(), such as partitions
- * that are subject to initial runtime pruning.
- */
-static inline bool
-CachedPlanRequiresLocking(CachedPlan *cplan)
-{
- return !cplan->is_oneshot && cplan->is_reused;
-}
-
-/*
- * CachedPlanValid
- * Returns whether a cached generic plan is still valid.
- *
- * Invoked by the executor to check if the plan has not been invalidated after
- * taking locks during the initialization of the plan.
- */
-static inline bool
-CachedPlanValid(CachedPlan *cplan)
-{
- return cplan->is_valid;
-}
-
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index ddee031f551..0b62143af8b 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,7 +138,6 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
- CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,8 +240,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan,
- CachedPlanSource *plansource);
+ CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-05-22 08:12 ` Amit Langote <[email protected]>
2025-05-22 13:04 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 2 replies; 66+ messages in thread
From: Amit Langote @ 2025-05-22 08:12 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, May 21, 2025 at 7:22 PM Amit Langote <[email protected]> wrote:
> Fair enough. I’ll revert this and some related changes shortly. WIP
> patch attached.
I have pushed out the revert now.
Note that I’ve only reverted the changes related to deferring locks on
prunable partitions. I’m planning to leave the preparatory commits
leading up to that one in place unless anyone objects. For reference,
here they are in chronological order (the last 3 are bug fixes):
bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
cbc127917e0 Track unpruned relids to avoid processing pruned relations
75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
28317de723b Ensure first ModifyTable rel initialized if all are pruned
I think separating initial pruning from plan node initialization is
still worthwhile on its own, as evidenced by the improvements in
cbc127917e.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-05-22 13:04 ` Tomas Vondra <[email protected]>
2025-05-23 02:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Tomas Vondra @ 2025-05-22 13:04 UTC (permalink / raw)
To: Amit Langote <[email protected]>; Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On 5/22/25 10:12, Amit Langote wrote:
> On Wed, May 21, 2025 at 7:22 PM Amit Langote <[email protected]> wrote:
>> Fair enough. I’ll revert this and some related changes shortly. WIP
>> patch attached.
>
> I have pushed out the revert now.
>
Thank you.
> Note that I’ve only reverted the changes related to deferring locks on
> prunable partitions. I’m planning to leave the preparatory commits
> leading up to that one in place unless anyone objects. For reference,
> here they are in chronological order (the last 3 are bug fixes):
>
> bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> cbc127917e0 Track unpruned relids to avoid processing pruned relations
> 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> 28317de723b Ensure first ModifyTable rel initialized if all are pruned
>
> I think separating initial pruning from plan node initialization is
> still worthwhile on its own, as evidenced by the improvements in
> cbc127917e.
>
I'm OK with that in principle, assuming the benefits outweigh the risk
of making backpatching harder. The patches don't seem exceptionally
large / invasive, but I don't know how often we modify these parts.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 13:04 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
@ 2025-05-23 02:17 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2025-05-23 02:17 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, May 22, 2025 at 10:04 PM Tomas Vondra <[email protected]> wrote:
> On 5/22/25 10:12, Amit Langote wrote:
> > Note that I’ve only reverted the changes related to deferring locks on
> > prunable partitions. I’m planning to leave the preparatory commits
> > leading up to that one in place unless anyone objects. For reference,
> > here they are in chronological order (the last 3 are bug fixes):
> >
> > bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> > d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> > cbc127917e0 Track unpruned relids to avoid processing pruned relations
> > 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> > cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> > 28317de723b Ensure first ModifyTable rel initialized if all are pruned
> >
> > I think separating initial pruning from plan node initialization is
> > still worthwhile on its own, as evidenced by the improvements in
> > cbc127917e.
> >
>
> I'm OK with that in principle, assuming the benefits outweigh the risk
> of making backpatching harder. The patches don't seem exceptionally
> large / invasive, but I don't know how often we modify these parts.
Thanks. I agree it's something to be mindful of, but I don’t expect
the reimplementation of the locking deferral to require changes to
this part of the code again. So barring any surprises, it shouldn't be
the case that the pruning code ends up looking significantly different
in v19.
Also, the actual pruning logic hasn’t changed much -- just where it’s
called from.
Let me know if any of that still raises concerns.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-06-20 12:30 ` Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-06-20 12:30 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, May 22, 2025 at 5:12 PM Amit Langote <[email protected]> wrote:
> I have pushed out the revert now.
>
> Note that I’ve only reverted the changes related to deferring locks on
> prunable partitions. I’m planning to leave the preparatory commits
> leading up to that one in place unless anyone objects. For reference,
> here they are in chronological order (the last 3 are bug fixes):
>
> bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> cbc127917e0 Track unpruned relids to avoid processing pruned relations
> 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> 28317de723b Ensure first ModifyTable rel initialized if all are pruned
>
> I think separating initial pruning from plan node initialization is
> still worthwhile on its own, as evidenced by the improvements in
> cbc127917e.
I've been thinking about how to address the concerns Tom raised about
the reverted patch. Here's a summary of where my thinking currently
stands.
* CachedPlan invalidation handling:
The first issue is the part of the old design where a CachedPlan
invalidated during executor startup -- while locking unpruned
partitions -- was modified in place to replace the stale PlannedStmts
in its stmt_list with new ones obtained by replanning all queries in
the enclosing CachedPlanSource's query_list. I did that mainly to
ensure that replanning happens as soon as the executor discovers the
plan is invalid, instead of returning to the caller and requiring them
to go back to plancache.c to trigger replanning. There were many
issues with making that approach work in practice, because different
callers of the executor have different ways of running plans from a
CachedPlan -- with pquery.c in particular being hard to refactor
cleanly to support that flow.
The first alternative I came up with is to place only the query whose
PlannedStmt is being initialized into a standalone CachedPlanSource
and create a corresponding standalone CachedPlan. "Standalone" here
means that both objects are "saved" independently of the original
CachedPlanSource and CachedPlan, but are still tracked by the
invalidation callbacks.
But thinking about it more recently, what's actually important is not
whether we construct a new CachedPlan at all, but simply that we
replan just the one query that needs to be run, and use the resulting
PlannedStmt directly. The planner will have taken all required locks,
so we don't need to register the plan with the invalidation machinery
-- concurrent invalidations can't affect correctness.
In that case, the replanned PlannedStmt can be treated as transient
executor-local state, with no need to carry any of the plan cache
infrastructure along with it. To support that, I further assume that,
because replanning and execution happen essentially back-to-back,
there's no opportunity for role-based or xmin-based invalidation (as
is checked for a CachedPlan in CheckCachedPlan()) to affect the plan
in between. If that reasoning holds, then we don't need to register
the replanned statement with the invalidation machinery at all.
Because we wouldn't have touched the original CachedPlan at all, the
stale PlannedStmts in it wouldn't be replaced until the next
GetCachedPlan() call triggers replanning. I'm willing to accept that
as a tradeoff for a less invasive design to handle replanning in the
executor.
Finally, it's worth noting that the executor is always passed the
entire CachedPlan, regardless of which individual statement is being
executed. Without per-statement validity tracking, it's hard for the
executor to tell whether replanning is actually needed for a given
query when the CachedPlan is marked invalid (is_valid=false), making
it impossible to selectively replan just one. To support that, what I
would need is validity tracking at the level of individual
PlannedStmts -- and perhaps even Querys -- in the source's query_list,
with the current is_valid flag effectively serving as the logical AND
of all the individual flags. We didn't need that in the old design,
because we'd replace all statements to mark the CachedPlan valid again
-- though Tom was right to point out flaws in the assumption that
setting is_valid like that was actually safe.
* ExecutorStart() interface damage control:
The other aspect I’ve been thinking about is how to contain the
changes required inside ExecutorStart(), and limit the disruption to
ExecutorStart_hooks in particular, while keeping changes for outside
callers narrowly scoped. In the previous patch, pruning, locking, and
invalidation checking were all done inside InitPlan(), which is called
by standard_ExecutorStart() -- an implementation choice that was
potentially disruptive to extensions using ExecutorStart_hook. Since
such hooks are expected to call standard_ExecutorStart() to perform
core plan initialization, they would have to check afterward whether
the plan had actually been initialized successfully, in case an
invalidation occurred during InitPlan(). That wasn’t optional, and it
made it easy for hook authors to miss the fact that
standard_ExecutorStart() could return without initializing the plan,
breaking expectations that were previously reliable.
Separately, for top-level callers of the executor, the patch
introduced a new entry point, ExecutorStartCachedPlan(), to avoid
requiring each caller to implement its own replanning loop. But that
approach was also awkward, since it required switching to a
nonstandard function just to get correct behavior.
What I’m thinking now is that we should instead move the logic for
pruning, deferred locking, and replanning directly into
ExecutorStart() itself. In the reverted patch, callers were affected
mainly because they had to choose between ExecutorStart() and a new
entry point, ExecutorStartCachedPlan(), which existed solely to handle
invalidation and replanning. That divergence from the standard API
made things awkward at the call site.
In contrast, the design I’m proposing avoids any need for new executor
entry points -- ExecutorStart() retains its original signature and
behavior, with the added benefit that replanning and pruning are now
handled internally before hooks or standard initialization logic are
invoked. The design requires moving some code from
standard_ExecutorStart() -- specifically the code that sets up the
EState and parameters -- and from InitPlan() -- namely, the parts that
initialize the range table, partition pruning state, and perform
ExecDoInitialPruning().
The callers of ExecutorStart() do still need to ensure that they pass
the CachedPlan, the CachedPlanSource, and the query_index in QueryDesc
via CreateQueryDesc(). The executor’s external API remains unchanged.
Importantly, this restructuring would not require any behavioral
changes for existing ExecutorStart_hook implementations. From a hook’s
point of view, this is a code motion change only. Hooks are still
invoked at the same point, but they’re now guaranteed to receive a
plan that is valid and ready for execution. This avoids the control
flow surprises introduced by the reverted patch -- specifically, the
need for hooks to detect whether standard_ExecutorStart() had
completed successfully -- while preserving the executor’s API and
execution contract as they exist in master.
I’ll hold off on writing any code for now -- just wanted to lay out
this direction and hear what others think, especially Tom.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-07-17 12:11 ` Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-07-17 12:11 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Fri, Jun 20, 2025 at 9:30 PM Amit Langote <[email protected]> wrote:
> On Thu, May 22, 2025 at 5:12 PM Amit Langote <[email protected]> wrote:
> > I have pushed out the revert now.
> >
> > Note that I’ve only reverted the changes related to deferring locks on
> > prunable partitions. I’m planning to leave the preparatory commits
> > leading up to that one in place unless anyone objects. For reference,
> > here they are in chronological order (the last 3 are bug fixes):
> >
> > bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> > d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> > cbc127917e0 Track unpruned relids to avoid processing pruned relations
> > 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> > cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> > 28317de723b Ensure first ModifyTable rel initialized if all are pruned
> >
> > I think separating initial pruning from plan node initialization is
> > still worthwhile on its own, as evidenced by the improvements in
> > cbc127917e.
>
> I've been thinking about how to address the concerns Tom raised about
> the reverted patch. Here's a summary of where my thinking currently
> stands.
>
> * CachedPlan invalidation handling:
>
> The first issue is the part of the old design where a CachedPlan
> invalidated during executor startup -- while locking unpruned
> partitions -- was modified in place to replace the stale PlannedStmts
> in its stmt_list with new ones obtained by replanning all queries in
> the enclosing CachedPlanSource's query_list. I did that mainly to
> ensure that replanning happens as soon as the executor discovers the
> plan is invalid, instead of returning to the caller and requiring them
> to go back to plancache.c to trigger replanning. There were many
> issues with making that approach work in practice, because different
> callers of the executor have different ways of running plans from a
> CachedPlan -- with pquery.c in particular being hard to refactor
> cleanly to support that flow.
>
> The first alternative I came up with is to place only the query whose
> PlannedStmt is being initialized into a standalone CachedPlanSource
> and create a corresponding standalone CachedPlan. "Standalone" here
> means that both objects are "saved" independently of the original
> CachedPlanSource and CachedPlan, but are still tracked by the
> invalidation callbacks.
>
> But thinking about it more recently, what's actually important is not
> whether we construct a new CachedPlan at all, but simply that we
> replan just the one query that needs to be run, and use the resulting
> PlannedStmt directly. The planner will have taken all required locks,
> so we don't need to register the plan with the invalidation machinery
> -- concurrent invalidations can't affect correctness.
>
> In that case, the replanned PlannedStmt can be treated as transient
> executor-local state, with no need to carry any of the plan cache
> infrastructure along with it. To support that, I further assume that,
> because replanning and execution happen essentially back-to-back,
> there's no opportunity for role-based or xmin-based invalidation (as
> is checked for a CachedPlan in CheckCachedPlan()) to affect the plan
> in between. If that reasoning holds, then we don't need to register
> the replanned statement with the invalidation machinery at all.
>
> Because we wouldn't have touched the original CachedPlan at all, the
> stale PlannedStmts in it wouldn't be replaced until the next
> GetCachedPlan() call triggers replanning. I'm willing to accept that
> as a tradeoff for a less invasive design to handle replanning in the
> executor.
>
> Finally, it's worth noting that the executor is always passed the
> entire CachedPlan, regardless of which individual statement is being
> executed. Without per-statement validity tracking, it's hard for the
> executor to tell whether replanning is actually needed for a given
> query when the CachedPlan is marked invalid (is_valid=false), making
> it impossible to selectively replan just one. To support that, what I
> would need is validity tracking at the level of individual
> PlannedStmts -- and perhaps even Querys -- in the source's query_list,
> with the current is_valid flag effectively serving as the logical AND
> of all the individual flags. We didn't need that in the old design,
> because we'd replace all statements to mark the CachedPlan valid again
> -- though Tom was right to point out flaws in the assumption that
> setting is_valid like that was actually safe.
>
> * ExecutorStart() interface damage control:
>
> The other aspect I’ve been thinking about is how to contain the
> changes required inside ExecutorStart(), and limit the disruption to
> ExecutorStart_hooks in particular, while keeping changes for outside
> callers narrowly scoped. In the previous patch, pruning, locking, and
> invalidation checking were all done inside InitPlan(), which is called
> by standard_ExecutorStart() -- an implementation choice that was
> potentially disruptive to extensions using ExecutorStart_hook. Since
> such hooks are expected to call standard_ExecutorStart() to perform
> core plan initialization, they would have to check afterward whether
> the plan had actually been initialized successfully, in case an
> invalidation occurred during InitPlan(). That wasn’t optional, and it
> made it easy for hook authors to miss the fact that
> standard_ExecutorStart() could return without initializing the plan,
> breaking expectations that were previously reliable.
>
> Separately, for top-level callers of the executor, the patch
> introduced a new entry point, ExecutorStartCachedPlan(), to avoid
> requiring each caller to implement its own replanning loop. But that
> approach was also awkward, since it required switching to a
> nonstandard function just to get correct behavior.
>
> What I’m thinking now is that we should instead move the logic for
> pruning, deferred locking, and replanning directly into
> ExecutorStart() itself. In the reverted patch, callers were affected
> mainly because they had to choose between ExecutorStart() and a new
> entry point, ExecutorStartCachedPlan(), which existed solely to handle
> invalidation and replanning. That divergence from the standard API
> made things awkward at the call site.
>
> In contrast, the design I’m proposing avoids any need for new executor
> entry points -- ExecutorStart() retains its original signature and
> behavior, with the added benefit that replanning and pruning are now
> handled internally before hooks or standard initialization logic are
> invoked. The design requires moving some code from
> standard_ExecutorStart() -- specifically the code that sets up the
> EState and parameters -- and from InitPlan() -- namely, the parts that
> initialize the range table, partition pruning state, and perform
> ExecDoInitialPruning().
>
> The callers of ExecutorStart() do still need to ensure that they pass
> the CachedPlan, the CachedPlanSource, and the query_index in QueryDesc
> via CreateQueryDesc(). The executor’s external API remains unchanged.
>
> Importantly, this restructuring would not require any behavioral
> changes for existing ExecutorStart_hook implementations. From a hook’s
> point of view, this is a code motion change only. Hooks are still
> invoked at the same point, but they’re now guaranteed to receive a
> plan that is valid and ready for execution. This avoids the control
> flow surprises introduced by the reverted patch -- specifically, the
> need for hooks to detect whether standard_ExecutorStart() had
> completed successfully -- while preserving the executor’s API and
> execution contract as they exist in master.
>
> I’ll hold off on writing any code for now -- just wanted to lay out
> this direction and hear what others think, especially Tom.
The refinements I described in my email above might help mitigate some
of those executor-related issues. However, I'm starting to wonder if
it's worth reconsidering our decision to handle pruning, locking, and
validation entirely at executor startup, which was the approach taken
in the reverted patch.
The alternative approach, doing initial pruning and locking within
plancache.c itself (which I floated a while ago), might be worth
revisiting. It avoids the complications we've discussed around the
executor API and preserves the clear separation of concerns that
plancache.c provides, though it does introduce some new layering
concerns, which I describe further below.
To support this, we'd need a mechanism to pass pruning results to the
executor alongside each PlannedStmt. For each PartitionPruneInfo in
the plan, that would include the corresponding PartitionPruneState and
the bitmapset of surviving relids determined by initial pruning. Given
that a CachedPlan can contain multiple PlannedStmts, this would
effectively be a list of pruning results, one per statement. One
reasonable way to handle that might be to define a parallel data
structure, separate from PlannedStmt, constructed by plancache.c and
carried via QueryDesc. The memory and lifetime management would mirror
how ParamListInfo is handled today, leaving the executor API unchanged
and avoiding intrusive changes to PlannedStmt.
However, one potentially problematic aspect of this design is managing
the lifecycle of the relations referenced by PartitionPruneState.
Currently, partitioned table relations are opened by the executor
after entering ExecutorStart() and closed automatically by
ExecEndPlan(), allowing cleanup of pruning states implicitly. If we
perform initial pruning earlier, we'd need to keep these relations
open longer, necessitating explicit cleanup calls (e.g., a new
FinishPartitionPruneState()) invoked by the caller of the executor,
such as from ExecutorEnd() or even higher-level callers. This
introduces some questionable layering by shifting responsibility for
relation management tasks, which ideally belong within the executor,
into its callers.
My sense is that the complexity involved in carrying pruning results
via this parallel data structure was one of the concerns Tom raised
previously, alongside the significant pruning code refactoring that
the earlier patch required. The latter, at least, should no longer be
necessary given recent code improvements.
I think that's about as many approaches as I can think of, and would
really appreciate others' thoughts on these alternatives.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-07-22 06:43 ` Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-07-22 06:43 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, Jul 17, 2025 at 9:11 PM Amit Langote <[email protected]> wrote:
> The refinements I described in my email above might help mitigate some
> of those executor-related issues. However, I'm starting to wonder if
> it's worth reconsidering our decision to handle pruning, locking, and
> validation entirely at executor startup, which was the approach taken
> in the reverted patch.
>
> The alternative approach, doing initial pruning and locking within
> plancache.c itself (which I floated a while ago), might be worth
> revisiting. It avoids the complications we've discussed around the
> executor API and preserves the clear separation of concerns that
> plancache.c provides, though it does introduce some new layering
> concerns, which I describe further below.
>
> To support this, we'd need a mechanism to pass pruning results to the
> executor alongside each PlannedStmt. For each PartitionPruneInfo in
> the plan, that would include the corresponding PartitionPruneState and
> the bitmapset of surviving relids determined by initial pruning. Given
> that a CachedPlan can contain multiple PlannedStmts, this would
> effectively be a list of pruning results, one per statement. One
> reasonable way to handle that might be to define a parallel data
> structure, separate from PlannedStmt, constructed by plancache.c and
> carried via QueryDesc. The memory and lifetime management would mirror
> how ParamListInfo is handled today, leaving the executor API unchanged
> and avoiding intrusive changes to PlannedStmt.
>
> However, one potentially problematic aspect of this design is managing
> the lifecycle of the relations referenced by PartitionPruneState.
> Currently, partitioned table relations are opened by the executor
> after entering ExecutorStart() and closed automatically by
> ExecEndPlan(), allowing cleanup of pruning states implicitly. If we
> perform initial pruning earlier, we'd need to keep these relations
> open longer, necessitating explicit cleanup calls (e.g., a new
> FinishPartitionPruneState()) invoked by the caller of the executor,
> such as from ExecutorEnd() or even higher-level callers. This
> introduces some questionable layering by shifting responsibility for
> relation management tasks, which ideally belong within the executor,
> into its callers.
>
> My sense is that the complexity involved in carrying pruning results
> via this parallel data structure was one of the concerns Tom raised
> previously, alongside the significant pruning code refactoring that
> the earlier patch required. The latter, at least, should no longer be
> necessary given recent code improvements.
One point I forgot to mention about this approach is that we'd also
need to ensure permissions on parent relations are checked before
performing initial pruning in plancache.c, since pruning may involve
evaluating user-provided expressions. So in effect, we'd need to
invoke not just ExecDoInitialPruning(), but also
ExecCheckPermissions(), or some variant of it, prior to executor
startup. While manageable, it does add slightly to the complexity.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-11-12 14:17 ` Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-11-12 14:17 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi,
On Tue, Jul 22, 2025 at 3:43 PM Amit Langote <[email protected]> wrote:
> On Thu, Jul 17, 2025 at 9:11 PM Amit Langote <[email protected]> wrote:
> > The refinements I described in my email above might help mitigate some
> > of those executor-related issues. However, I'm starting to wonder if
> > it's worth reconsidering our decision to handle pruning, locking, and
> > validation entirely at executor startup, which was the approach taken
> > in the reverted patch.
> >
> > The alternative approach, doing initial pruning and locking within
> > plancache.c itself (which I floated a while ago), might be worth
> > revisiting. It avoids the complications we've discussed around the
> > executor API and preserves the clear separation of concerns that
> > plancache.c provides, though it does introduce some new layering
> > concerns, which I describe further below.
> >
> > To support this, we'd need a mechanism to pass pruning results to the
> > executor alongside each PlannedStmt. For each PartitionPruneInfo in
> > the plan, that would include the corresponding PartitionPruneState and
> > the bitmapset of surviving relids determined by initial pruning. Given
> > that a CachedPlan can contain multiple PlannedStmts, this would
> > effectively be a list of pruning results, one per statement. One
> > reasonable way to handle that might be to define a parallel data
> > structure, separate from PlannedStmt, constructed by plancache.c and
> > carried via QueryDesc. The memory and lifetime management would mirror
> > how ParamListInfo is handled today, leaving the executor API unchanged
> > and avoiding intrusive changes to PlannedStmt.
> >
> > However, one potentially problematic aspect of this design is managing
> > the lifecycle of the relations referenced by PartitionPruneState.
> > Currently, partitioned table relations are opened by the executor
> > after entering ExecutorStart() and closed automatically by
> > ExecEndPlan(), allowing cleanup of pruning states implicitly. If we
> > perform initial pruning earlier, we'd need to keep these relations
> > open longer, necessitating explicit cleanup calls (e.g., a new
> > FinishPartitionPruneState()) invoked by the caller of the executor,
> > such as from ExecutorEnd() or even higher-level callers. This
> > introduces some questionable layering by shifting responsibility for
> > relation management tasks, which ideally belong within the executor,
> > into its callers.
> >
> > My sense is that the complexity involved in carrying pruning results
> > via this parallel data structure was one of the concerns Tom raised
> > previously, alongside the significant pruning code refactoring that
> > the earlier patch required. The latter, at least, should no longer be
> > necessary given recent code improvements.
>
> One point I forgot to mention about this approach is that we'd also
> need to ensure permissions on parent relations are checked before
> performing initial pruning in plancache.c, since pruning may involve
> evaluating user-provided expressions. So in effect, we'd need to
> invoke not just ExecDoInitialPruning(), but also
> ExecCheckPermissions(), or some variant of it, prior to executor
> startup. While manageable, it does add slightly to the complexity.
Sorry for the absence. I've now implemented the approach mentioned
above and split it into a series of reasonably isolated patches.
The key idea is to avoid taking unnecessary locks when reusing a
cached plan. To achieve that, we need to perform initial partition
pruning during cached plan reuse in plancache.c so that only surviving
partitions are locked. This requires some plumbing to reuse the result
of this "early" pruning during executor startup, because repeating the
pruning logic would be both inefficient and potentially inconsistent
-- what if you get different results the second time? (I don't have
proof that this can happen, but some earlier emails mention the
theoretical risk, so better to be safe.)
So this patch introduces ExecutorPrep(), which allows executor
metadata such as initial pruning results (valid subplan indexes) and
full unpruned_relids to be computed ahead of execution and reused
later by ExecutorStart() and during QueryDesc setup in parallel
workers using the results shared by the leader. The parallel query bit
was discussed previously at [1], though I didn’t have a solution I
liked then.
This revives an idea that was last implemented in the patch (v30)
posted on Dec 16, 2022. In retrospect, I understand the hesitation Tom
might have had about the patch at the time -- its changes to enable
early pruning and then feed the results into ExecutorStart() were less
than pretty. Thanks to the initial pruning code refactoring that I
committed in Postgres 18, those changes now seem much more principled
and modular IMO.
The patch set is structured as follows:
* Refactor partition pruning initialization (0001): separates the
setup of the pruning state from its execution by introducing
ExecCreatePartitionPruneStates(). This makes the pruning logic easier
to reuse and adds flexibility to do only the setup but skip pruning in
some cases.
* Introduce ExecutorPrep infrastructure (0002): adds ExecutorPrep()
and ExecPrep as a formal way to perform executor setup ahead of
execution. This enables caching or transferring pruning results and
other metadata without triggering execution. ExecutorStart() can now
consume precomputed prep state from the EState created during
ExecutorPrep(). ExecPrepCleanup() handles cleanup when the plan is
invalidated during prep and so not executed; the state is cleaned up
in the regular ExecutorEnd() path otherwise.
* Allow parallel workers to reuse leader pruning results (0003): lets
workers reuse the leader’s initial pruning results (valid subplan
indexes) and unpruned_relids via ExecutorPrep(). This adds a
verification step to check that leader and worker decisions match,
throwing an error if they don’t -- so "reuse" is a bit of a lie.
Should that check be debug-only? (Maybe not.) As mentioned above, this
was previously discussed at [1].
* Enable pruning-aware locking in cached / generic plan reuse (0004):
extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
on each PlannedStmt in the CachedPlan, locking only surviving
partitions. Adds CachedPlanPrepData to pass this through plan cache
APIs and down to execution via QueryDesc. Also reinstates the
firstResultRel locking rule added in 28317de72 but later lost due to
revert of the earlier pruning patch, to ensure correctness when all
target partitions are pruned.
This approach keeps plan caching and validation logic self-contained
in plancache.c, avoids invasive executor API changes.
Benchmark results:
echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf
for p in 32 64 128 256 512 1024; do pgbench -i --partitions=$p >
/dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10 -Mprepared | grep
tps; done
Master
32 tps = 23841.822407 (without initial connection time)
64 tps = 21578.619816 (without initial connection time)
128 tps = 18090.500707 (without initial connection time)
256 tps = 14152.248201 (without initial connection time)
512 tps = 9432.708423 (without initial connection time)
1024 tps = 5873.696475 (without initial connection time)
Patched
32 tps = 24724.245798 (without initial connection time)
64 tps = 24858.206407 (without initial connection time)
128 tps = 24652.655269 (without initial connection time)
256 tps = 23656.756615 (without initial connection time)
512 tps = 22299.865769 (without initial connection time)
1024 tps = 21911.704317 (without initial connection time)
Comments welcome.
[1] https://www.postgresql.org/message-id/CA%2BHiwqFA%3DswkzgGK8AmXUNFtLeEXFJwFyY3E7cTxvL46aa1OTw%40mail...
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v1-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.0K, 2-v1-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From d23a05d6f412dcbfd38a910331527765999d78e9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v1 3/4] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..f16ef184c68 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 187a480e508..3b450e3373f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1872,6 +1872,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v1-0001-Refactor-partition-pruning-initialization-for-cla.patch (7.7K, 3-v1-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 243d407de86b0a73b9bd8c8dbc541f630eb33747 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v1 1/4] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
2 files changed, 43 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aa12e9ad2ea..88b150c8d77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1772,6 +1771,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1796,6 +1798,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1803,11 +1828,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1825,20 +1850,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1846,8 +1864,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1965,14 +1981,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2206,8 +2220,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2219,8 +2233,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
[application/octet-stream] v1-0004-Use-pruning-aware-locking-in-cached-plans.patch (25.0K, 4-v1-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From ddffccd68513bb0e68d6cf75810cf64cf9a4d757 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v1 4/4] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 15 +-
src/backend/executor/functions.c | 14 +-
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 22 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 7 +-
src/backend/utils/cache/plancache.c | 223 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 7 +
src/include/utils/plancache.h | 23 ++-
11 files changed, 299 insertions(+), 23 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..10fdff403b9 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,9 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +208,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +578,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +637,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +660,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..8fc22fbd283 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,10 +698,13 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
- NULL);
+ NULL,
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -719,9 +725,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i) :
+ NULL;
execution_state *newes;
/*
@@ -763,6 +772,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,7 +1372,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4c5647ac38a..c5812612f8d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4648,8 +4648,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..72d52baff4b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1689,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* XXX - need copy? */
cplan);
/*
@@ -2078,6 +2082,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2106,12 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2509,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2586,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d3964a12a14..82972beee70 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2037,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..ebcf601fce7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,32 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a plan according to
+ * the specified policy.
+ *
+ * The policy determines whether all relations or only unpruned ones are locked.
+ * For LOCK_UNPRUNED, ExecutorPrep is invoked to identify surviving partitions
+ * and its result is populated in cprep.
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2016,153 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * This function uses ExecutorPrep to identify which partitions survive
+ * initial runtime pruning and locks only those, along with any unprunable
+ * base relations. During acquire, the resulting ExecPrep objects are stored
+ * in cprep->prep_list for later reuse. During release, those same ExecPrep
+ * objects are used to identify what to unlock.
+ *
+ * Unlike AcquireExecutorLocks(), which locks all relations listed in the
+ * PlannedStmt's rtable (LOCK_ALL policy), this function selectively locks
+ * only those rels that may be referenced during execution.
+ *
+ * prep_list is extended during acquire and must match stmt_list during
+ * release. Memory allocation happens in cprep->context.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 30d889b54c5..6fb86dc05f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..42b51299ece 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,13 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE
+ */
+ /* integer list of RT indexes, or NIL */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..59f0b0fc4a4 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,26 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * Populated by GetCachedPlan() when ExecutorPrep is run on a generic plan.
+ *
+ * prep_list: results from ExecutorPrep(), one per PlannedStmt
+ * params: parameters that may be used during ExecutorPrep (e.g., pruning)
+ * context: memory context to allocate ExecutorPrep results in
+ * owner: resource owner to associate ExecutorPrep resources with
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* List of ExecPrep */
+ ParamListInfo params;
+ MemoryContext context;
+ ResourceOwner owner;
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +260,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v1-0002-Introduce-ExecutorPrep-infrastructure-for-pre-exe.patch (29.9K, 5-v1-0002-Introduce-ExecutorPrep-infrastructure-for-pre-exe.patch)
download | inline diff:
From e9689618f2889f224eb62e9ff4fb5251285ecdb3 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v1 2/4] Introduce ExecutorPrep infrastructure for
pre-execution setup
Add ExecutorPrep() and ExecPrep to support setting up executor
metadata like range table initialization and partition pruning
ahead of actual execution. This enables execution paths to
perform setup independently of running the plan.
For example, plan validation can compute and consume this
metadata without executing the query. Parallel query workers
can receive pre-initialized state from the leader and pass it
to ExecutorStart, avoiding redundant setup.
ExecutorStart now accepts a prep-estate from QueryDesc to skip
repeating initialization. The ExecPrep wrapper manages cleanup
and signals ownership of the estate. PrepPlan() encapsulates
shared setup logic.
Call sites, including Portal, SPI, and EXPLAIN, are updated to
support passing down the prep data. These changes are mostly
mechanical and clarify the separation between setup and actual
execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 9 +-
src/backend/executor/execMain.c | 192 +++++++++++++++++++++++----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 10 ++
src/include/nodes/execnodes.h | 55 ++++++++
src/include/utils/portal.h | 2 +
21 files changed, 308 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 93ef1ad106f..3cca6d45ec1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..6e481398f18 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [Optional] ExecutorPrep
+ - May be run before ExecutorStart (e.g., for plan validation).
+ - Performs range table initialization, permission checks, and
+ initial partition pruning.
+ - Returns an ExecPrep wrapper with EState that ExecutorStart may
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..1b96b251c34 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -75,6 +75,7 @@ ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
static void InitPlan(QueryDesc *queryDesc, int eflags);
+static void PrepPlan(EState *estate, bool do_initial_pruning);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -171,8 +172,24 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep)
+ {
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+ }
+ else
+ estate = CreateExecutorState();
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +280,143 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart(); see InitPlan().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later. Returns NULL if no prep is needed (e.g. no pruning).
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ Assert(pstmt->commandType != CMD_UTILITY);
+
+ /* No pruning needed -- let normal ExecutorStart handle setup later. */
+ if (pstmt->partPruneInfos == NIL)
+ return NULL;
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ PrepPlan(estate, do_initial_pruning);
+
+ CurrentResourceOwner = oldowner;
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * PrepPlan: initialize executor metadata needed before plan execution.
+ *
+ * Sets up permissions, range table, and partition pruning infrastructure.
+ * If do_initial_pruning is true, performs initial pruning and stores the
+ * resulting subplan indexes in es_part_prune_results. Otherwise, this step
+ * is skipped, typically when results are provided externally (e.g., in
+ * parallel workers).
+ *
+ * Called from both ExecutorPrep() and InitPlan().
+ */
+static void
+PrepPlan(EState *estate, bool do_initial_pruning)
+{
+ PlannedStmt *pstmt = estate->es_plannedstmt;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false — they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +978,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,7 +991,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
@@ -846,29 +998,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
int i;
/*
- * Do permissions checks
+ * If ExecutorPrep() was not run earlier (e.g., during plan validation),
+ * perform InitPlan setup: init range table, check permissions, and run
+ * initial pruning. Otherwise, the executor will reuse the same information
+ * in queryDesc->prep->prep_estate.
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ if (queryDesc->prep == NULL)
+ {
+ estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ PrepPlan(estate, true);
+ }
+ else
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88b150c8d77..187a480e508 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2368,6 +2368,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2bd89102686..d3964a12a14 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..bc90d0ea7ee 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,15 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..f569be3853f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,61 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * This is used when we want to perform executor setup steps -- such as
+ * initializing the range table, checking permissions, and executing initial
+ * partition pruning -- ahead of actual plan execution. A typical use case is
+ * in plan validation logic (e.g., when deciding whether to reuse a generic
+ * cached plan), where we need to determine exactly which partitions will be
+ * scanned and locked, without executing the full plan.
+ *
+ * The executor may later adopt the prepared EState (via ExecutorStart),
+ * avoiding redundant setup. In that case, the executor is responsible for
+ * freeing the state and ExecPrepCleanup() will skip it.
+ */
+struct ExecPrep;
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(struct ExecPrep *prep);
+
+/* ExecutorPrep output */
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-11-17 12:50 ` Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-11-17 12:50 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]> wrote:
> The key idea is to avoid taking unnecessary locks when reusing a
> cached plan. To achieve that, we need to perform initial partition
> pruning during cached plan reuse in plancache.c so that only surviving
> partitions are locked. This requires some plumbing to reuse the result
> of this "early" pruning during executor startup, because repeating the
> pruning logic would be both inefficient and potentially inconsistent
> -- what if you get different results the second time? (I don't have
> proof that this can happen, but some earlier emails mention the
> theoretical risk, so better to be safe.)
>
> So this patch introduces ExecutorPrep(), which allows executor
> metadata such as initial pruning results (valid subplan indexes) and
> full unpruned_relids to be computed ahead of execution and reused
> later by ExecutorStart() and during QueryDesc setup in parallel
> workers using the results shared by the leader. The parallel query bit
> was discussed previously at [1], though I didn’t have a solution I
> liked then.
>
...
> The patch set is structured as follows:
>
> * Refactor partition pruning initialization (0001): separates the
> setup of the pruning state from its execution by introducing
> ExecCreatePartitionPruneStates(). This makes the pruning logic easier
> to reuse and adds flexibility to do only the setup but skip pruning in
> some cases.
>
> * Introduce ExecutorPrep infrastructure (0002): adds ExecutorPrep()
> and ExecPrep as a formal way to perform executor setup ahead of
> execution. This enables caching or transferring pruning results and
> other metadata without triggering execution. ExecutorStart() can now
> consume precomputed prep state from the EState created during
> ExecutorPrep(). ExecPrepCleanup() handles cleanup when the plan is
> invalidated during prep and so not executed; the state is cleaned up
> in the regular ExecutorEnd() path otherwise.
In v1 patch, I had not made ExecutorStart() call ExecutorPrep() to do
the prep work (creating EState, setting up es_relations, checking
permissions) when QueryDesc did not carry the results of
ExecutorPrep() from some earlier stage. Instead, InitPlan() would
detect that prep was absent and perform the missing setup itself. On
second thought it is cleaner for ExecutorStart() to detect the absence
of prep and call ExecutorPrep() directly, matching how prep would be
created when coming from plancache et al.
v2 changes the patch to do that.
> * Enable pruning-aware locking in cached / generic plan reuse (0004):
> extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
> on each PlannedStmt in the CachedPlan, locking only surviving
> partitions. Adds CachedPlanPrepData to pass this through plan cache
> APIs and down to execution via QueryDesc. Also reinstates the
> firstResultRel locking rule added in 28317de72 but later lost due to
> revert of the earlier pruning patch, to ensure correctness when all
> target partitions are pruned.
Looking at the changes to executor/function.c, I also noticed that I
had mistakenly allocated the ExecutorPrep state in
SQLFunctionCache.fcontext whereas the correct context for execution
related state is SQLFunctionCache.subcontext. In the updated patch,
I've made postquel_start() reparent the prep EState's es_query_cxt to
subcontext from fcontext. I also did not have a test case that
exercised cached plan reuse for SQL functions, so I added one. I split
the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
new patch 0005 so it can be reviewed separately, since it is the only
non-mechanical call-site change.
> Benchmark results:
>
> echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf
> for p in 32 64 128 256 512 1024; do pgbench -i --partitions=$p >
> /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10 -Mprepared | grep
> tps; done
>
> Master
>
> 32 tps = 23841.822407 (without initial connection time)
> 64 tps = 21578.619816 (without initial connection time)
> 128 tps = 18090.500707 (without initial connection time)
> 256 tps = 14152.248201 (without initial connection time)
> 512 tps = 9432.708423 (without initial connection time)
> 1024 tps = 5873.696475 (without initial connection time)
>
> Patched
>
> 32 tps = 24724.245798 (without initial connection time)
> 64 tps = 24858.206407 (without initial connection time)
> 128 tps = 24652.655269 (without initial connection time)
> 256 tps = 23656.756615 (without initial connection time)
> 512 tps = 22299.865769 (without initial connection time)
> 1024 tps = 21911.704317 (without initial connection time)
Re-ran to include 0 partition case and more partitions than 1024:
echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf
for p in 0 8 16 32 64 128 256 512 1024 2048 4096; do pgbench -i
--partitions=$p > /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10
-Mprepared | grep tps; done
Master
0 tps = 23600.068719 (without initial connection time)
8 tps = 22548.439906 (without initial connection time)
16 tps = 22807.337363 (without initial connection time)
32 tps = 22837.789996 (without initial connection time)
64 tps = 22915.846820 (without initial connection time)
128 tps = 22958.472655 (without initial connection time)
256 tps = 22432.432730 (without initial connection time)
512 tps = 20327.618690 (without initial connection time)
1024 tps = 20554.932475 (without initial connection time)
2048 tps = 19947.061061 (without initial connection time)
4096 tps = 17294.369829 (without initial connection time)
Patched
0 tps = 23869.906654 (without initial connection time)
8 tps = 22682.498914 (without initial connection time)
16 tps = 22714.445711 (without initial connection time)
32 tps = 21653.589371 (without initial connection time)
64 tps = 20571.267545 (without initial connection time)
128 tps = 17138.088269 (without initial connection time)
256 tps = 13027.168426 (without initial connection time)
512 tps = 8689.486966 (without initial connection time)
1024 tps = 5450.525617 (without initial connection time)
2048 tps = 3034.383108 (without initial connection time)
4096 tps = 1560.110609 (without initial connection time)
Tabular format (+ve pct_change means patched better)
partitions master patched pct_change
----------------------------------------------------
0 23869.91 23600.07 -1.1%
8 22682.50 22548.44 -0.6%
16 22714.45 22807.34 +0.4%
32 21653.59 22837.79 +5.5%
64 20571.27 22915.85 +11.4%
128 17138.09 22958.47 +34.0%
256 13027.17 22432.43 +72.2%
512 8689.49 20327.62 +133.9%
1024 5450.53 20554.93 +277.1%
2048 3034.38 19947.06 +557.4%
4096 1560.11 17294.37 +1008.5%
I also did some runs for custom plans. The custom plan path should
behave about the same on master and patched since the early
ExecutorPrep() business only applies to generic plan reuse cases.
echo "plan_cache_mode = force_custom_plan" >> $PGDATA/postgresql.conf
for p in 0 8 16 32 64 128 256 512 1024 2048 4096; do pgbench -i
--partitions=$p > /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10
-Mprepared | grep tps; done
Master
pgbench -n -S -T10 -Mprepared | grep tps; done
0 tps = 22346.419557 (without initial connection time)
8 tps = 20959.115560 (without initial connection time)
16 tps = 21390.573290 (without initial connection time)
32 tps = 21358.292393 (without initial connection time)
64 tps = 21288.742635 (without initial connection time)
128 tps = 21167.721447 (without initial connection time)
256 tps = 21256.618661 (without initial connection time)
512 tps = 19401.261197 (without initial connection time)
1024 tps = 19169.135145 (without initial connection time)
2048 tps = 19504.102179 (without initial connection time)
4096 tps = 18880.855783 (without initial connection time)
Patched
0 tps = 22852.634752 (without initial connection time)
8 tps = 21596.432690 (without initial connection time)
16 tps = 21428.779996 (without initial connection time)
32 tps = 20629.225272 (without initial connection time)
64 tps = 21301.644733 (without initial connection time)
128 tps = 21098.543942 (without initial connection time)
256 tps = 21394.364662 (without initial connection time)
512 tps = 19475.152170 (without initial connection time)
1024 tps = 19585.768438 (without initial connection time)
2048 tps = 19810.211969 (without initial connection time)
4096 tps = 19160.981608 (without initial connection time)
In tabular format:
partitions master patched pct_change
----------------------------------------------------
0 22346.42 22852.63 +2.3%
8 20959.12 21596.43 +3.0%
16 21390.57 21428.78 +0.2%
32 21358.29 20629.23 -3.4%
64 21288.74 21301.64 +0.1%
128 21167.72 21098.54 -0.3%
256 21256.62 21394.36 +0.6%
512 19401.26 19475.15 +0.4%
1024 19169.14 19585.77 +2.2%
2048 19504.10 19810.21 +1.6%
4096 18880.86 19160.98 +1.5%
Numbers look within noise range as expected.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v2-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (6.5K, 2-v2-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From eef8d1af46ca8deefbf8eb95428d37fc900a0944 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 17 Nov 2025 17:40:26 +0900
Subject: [PATCH v2 5/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users, which a later
patch will use to support pruning aware locking.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 33 +++++++++++++++++++++++--
src/test/regress/expected/plancache.out | 31 +++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 29 ++++++++++++++++++++++
3 files changed, 91 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..ed7352fce61 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,10 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
- NULL);
+ NULL,
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -719,9 +732,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i++) :
+ NULL;
execution_state *newes;
/*
@@ -763,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1361,8 +1378,20 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ if (es->prep)
+ {
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ EState *prep_estate = es->prep->prep_estate;
+
+ MemoryContextSetParent(prep_estate->es_query_cxt, fcache->subcontext);
+ }
+
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..8c68691df91 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,34 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..56ebbbdecd2 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,32 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v2-0001-Refactor-partition-pruning-initialization-for-cla.patch (7.7K, 3-v2-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 243d407de86b0a73b9bd8c8dbc541f630eb33747 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v2 1/5] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
2 files changed, 43 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aa12e9ad2ea..88b150c8d77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1772,6 +1771,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1796,6 +1798,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1803,11 +1828,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1825,20 +1850,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1846,8 +1864,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1965,14 +1981,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2206,8 +2220,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2219,8 +2233,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
[application/octet-stream] v2-0004-Use-pruning-aware-locking-in-cached-plans.patch (24.0K, 4-v2-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 74dc075dc8f844e036fc38e005fc512b6dd54bc9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v2 4/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 26 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 234 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 ++
src/include/utils/plancache.h | 24 ++-
10 files changed, 312 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..23332d19b37 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4c5647ac38a..c5812612f8d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4648,8 +4648,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..d580f1e0425 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2588,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d3964a12a14..249829f59a0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2039,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..c1cfd47422c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting ExecPrep pointers are appended to
+ * cprep->prep_list in cprep->context. On release, the same ExecPrep
+ * list is consulted to determine which relations to unlock and is then
+ * cleaned up with ExecPrepCleanup().
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2022,158 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the ExecPrep pointer for each PlannedStmt to cprep->prep_list
+ *
+ * On release, it:
+ * - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - calls ExecPrepCleanup() on each ExecPrep
+ *
+ * prep_list is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for ExecPrep happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 30d889b54c5..6fb86dc05f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..eb211f1ba56 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..c7b8ec4be39 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,27 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_list is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. The
+ * same list is later used to determine which relations to unlock when
+ * releasing execution locks.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* one ExecPrep per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate ExecPrep objects */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to pass to ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +261,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v2-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.1K, 5-v2-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From d9d95e09961dcb8236e5fe7b2da4a37fda8e5944 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v2 3/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..751590adcc9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 187a480e508..3b450e3373f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1872,6 +1872,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v2-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (28.7K, 6-v2-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From 11e0262e31e35539f50e96531559db6cd7e32160 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v2 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present.
ExecutorStart() now expects QueryDesc->prep to point at such an
ExecPrep object. If no prep was supplied by the caller, it
invokes ExecutorPrep() itself and adopts the resulting EState
for the duration of the query. This keeps the executor startup
behaviour unchanged while making the setup work callable
separately when needed.
CreateQueryDesc() grows a prep argument and stores it in the
QueryDesc. Portals, SPI, SQL functions, and EXPLAIN are wired
to carry an optional ExecPrep pointer alongside the PlannedStmt
list, but most callers still pass NULL and let ExecutorStart()
perform the setup lazily.
Add the ExecPrep struct and ExecPrepCleanup() to encapsulate
ownership of the prepared EState and any caller specific
cleanup hook. Update executor/README and related comments to
document the new control flow and the separation between
preparation and execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 8 +-
src/backend/executor/execMain.c | 179 +++++++++++++++++++++++----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 11 ++
src/include/nodes/execnodes.h | 48 +++++++
src/include/utils/portal.h | 2 +
21 files changed, 286 insertions(+), 40 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 93ef1ad106f..3cca6d45ec1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..95b5ec58c55 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,16 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Performs range
+ table initialization, permission checks, and initial partition pruning.
+ Returns an ExecPrep wrapper with EState that ExecutorStart may reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..39de0b93a1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -171,8 +171,26 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep == NULL)
+ queryDesc->prep = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ Assert(queryDesc->prep);
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +281,136 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later.
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ bool snapshot_set;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Pruning may use expressions that require an active snapshot. */
+ snapshot_set = false;
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snapshot_set = true;
+ }
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ /* Release snapshot if we got one */
+ if (snapshot_set)
+ PopActiveSnapshot();
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +972,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,37 +985,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->prep);
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88b150c8d77..187a480e508 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2368,6 +2368,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2bd89102686..d3964a12a14 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..3579926d4e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,16 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..8bdecd631bf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,54 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * ExecutorPrep() factors out executor setup steps such as initializing the
+ * range table, checking permissions, and executing initial partition pruning.
+ * ExecutorStart() can reuse the prepared EState instead of repeating that
+ * work, and other callers (such as plan cache validation) can use it without
+ * running the full plan.
+ */
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(void *prep);
+
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-11-20 07:30 ` Amit Langote <[email protected]>
2025-11-23 12:17 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
0 siblings, 2 replies; 66+ messages in thread
From: Amit Langote @ 2025-11-20 07:30 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Mon, Nov 17, 2025 at 9:50 PM Amit Langote <[email protected]> wrote:
> On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]> wrote:
> > * Enable pruning-aware locking in cached / generic plan reuse (0004):
> > extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
> > on each PlannedStmt in the CachedPlan, locking only surviving
> > partitions. Adds CachedPlanPrepData to pass this through plan cache
> > APIs and down to execution via QueryDesc. Also reinstates the
> > firstResultRel locking rule added in 28317de72 but later lost due to
> > revert of the earlier pruning patch, to ensure correctness when all
> > target partitions are pruned.
>
> Looking at the changes to executor/function.c, I also noticed that I
> had mistakenly allocated the ExecutorPrep state in
> SQLFunctionCache.fcontext whereas the correct context for execution
> related state is SQLFunctionCache.subcontext. In the updated patch,
> I've made postquel_start() reparent the prep EState's es_query_cxt to
> subcontext from fcontext. I also did not have a test case that
> exercised cached plan reuse for SQL functions, so I added one. I split
> the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
> new patch 0005 so it can be reviewed separately, since it is the only
> non-mechanical call-site change.
I also noticed a bug in the prep cleanup logic that runs when a cached
plan becomes invalid during the prep phase. Patch 0005 fixes that and
adds a regression test that exercises the invalidation path. This will
be folded into 0004 later.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v3-0004-Use-pruning-aware-locking-in-cached-plans.patch (24.5K, 2-v3-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From dc0de03510539ddc3bd33327158785279356821f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v3 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 26 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 234 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 ++
src/include/utils/plancache.h | 24 ++-
11 files changed, 313 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..23332d19b37 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..d81718ea84e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4c5647ac38a..c5812612f8d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4648,8 +4648,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..d580f1e0425 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2588,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d3964a12a14..249829f59a0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2039,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..c1cfd47422c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting ExecPrep pointers are appended to
+ * cprep->prep_list in cprep->context. On release, the same ExecPrep
+ * list is consulted to determine which relations to unlock and is then
+ * cleaned up with ExecPrepCleanup().
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2022,158 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the ExecPrep pointer for each PlannedStmt to cprep->prep_list
+ *
+ * On release, it:
+ * - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - calls ExecPrepCleanup() on each ExecPrep
+ *
+ * prep_list is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for ExecPrep happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 30d889b54c5..6fb86dc05f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..eb211f1ba56 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..c7b8ec4be39 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,27 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_list is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. The
+ * same list is later used to determine which relations to unlock when
+ * releasing execution locks.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* one ExecPrep per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate ExecPrep objects */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to pass to ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +261,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch (9.3K, 3-v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch)
download | inline diff:
From 052ab8fe38493ca106d749f4e2426a86d0267d59 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 20 Nov 2025 15:35:47 +0900
Subject: [PATCH v3 5/6] Add test exercising prep cleanup on cached-plan
invalidation
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/utils/cache/plancache.c | 38 +++++++++++++--
src/test/regress/expected/plancache.out | 61 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 50 ++++++++++++++++++++
3 files changed, 144 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index c1cfd47422c..a9a4e11d1a5 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -103,6 +103,7 @@ static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -1033,6 +1034,9 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
/* Oops, the race case happened. Release useless locks. */
AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -2069,7 +2073,6 @@ LockRelids(List *rtable, Bitmapset *relids, bool acquire)
* - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
* (which must already be populated)
* - unlocks the same relations identified during acquire
- * - calls ExecPrepCleanup() on each ExecPrep
*
* prep_list is extended during acquire and must match stmt_list one-to-one
* when releasing locks. Memory allocation for ExecPrep happens in
@@ -2165,15 +2168,40 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
LockRelids(plannedstmt->rtable, lock_relids, acquire);
bms_free(lock_relids);
}
-
- /* Clean up prep if releasing locks. */
- if (!acquire)
- ExecPrepCleanup(prep);
}
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * CachedPlanPrepCleanup
+ * Clean up ExecPrep state built for a generic plan.
+ *
+ * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+
+ if (cprep == NULL)
+ return;
+
+ foreach(lc, cprep->prep_list)
+ {
+ ExecPrep *prep = (ExecPrep *) lfirst(lc);
+
+ ExecPrepCleanup(prep);
+ }
+
+ list_free(cprep->prep_list);
+ cprep->prep_list = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..26c4c5e10fd 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,64 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..cc7eb4da4d3 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,53 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (28.7K, 4-v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From 11e0262e31e35539f50e96531559db6cd7e32160 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v3 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present.
ExecutorStart() now expects QueryDesc->prep to point at such an
ExecPrep object. If no prep was supplied by the caller, it
invokes ExecutorPrep() itself and adopts the resulting EState
for the duration of the query. This keeps the executor startup
behaviour unchanged while making the setup work callable
separately when needed.
CreateQueryDesc() grows a prep argument and stores it in the
QueryDesc. Portals, SPI, SQL functions, and EXPLAIN are wired
to carry an optional ExecPrep pointer alongside the PlannedStmt
list, but most callers still pass NULL and let ExecutorStart()
perform the setup lazily.
Add the ExecPrep struct and ExecPrepCleanup() to encapsulate
ownership of the prepared EState and any caller specific
cleanup hook. Update executor/README and related comments to
document the new control flow and the separation between
preparation and execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 8 +-
src/backend/executor/execMain.c | 179 +++++++++++++++++++++++----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 11 ++
src/include/nodes/execnodes.h | 48 +++++++
src/include/utils/portal.h | 2 +
21 files changed, 286 insertions(+), 40 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 93ef1ad106f..3cca6d45ec1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..95b5ec58c55 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,16 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Performs range
+ table initialization, permission checks, and initial partition pruning.
+ Returns an ExecPrep wrapper with EState that ExecutorStart may reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..39de0b93a1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -171,8 +171,26 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep == NULL)
+ queryDesc->prep = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ Assert(queryDesc->prep);
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +281,136 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later.
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ bool snapshot_set;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Pruning may use expressions that require an active snapshot. */
+ snapshot_set = false;
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snapshot_set = true;
+ }
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ /* Release snapshot if we got one */
+ if (snapshot_set)
+ PopActiveSnapshot();
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +972,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,37 +985,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->prep);
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88b150c8d77..187a480e508 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2368,6 +2368,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2bd89102686..d3964a12a14 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..3579926d4e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,16 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..8bdecd631bf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,54 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * ExecutorPrep() factors out executor setup steps such as initializing the
+ * range table, checking permissions, and executing initial partition pruning.
+ * ExecutorStart() can reuse the prepared EState instead of repeating that
+ * work, and other callers (such as plan cache validation) can use it without
+ * running the full plan.
+ */
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(void *prep);
+
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (6.7K, 5-v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 733e3c712ec59b75da031694155c98476f290f37 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 17 Nov 2025 17:40:26 +0900
Subject: [PATCH v3 6/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users, which a later
patch will use to support pruning aware locking.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 32 +++++++++++++++++++++++--
src/test/regress/expected/plancache.out | 30 +++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 27 +++++++++++++++++++++
3 files changed, 87 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index d81718ea84e..ed7352fce61 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i++) :
+ NULL;
execution_state *newes;
/*
@@ -764,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,8 +1378,20 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ if (es->prep)
+ {
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ EState *prep_estate = es->prep->prep_estate;
+
+ MemoryContextSetParent(prep_estate->es_query_cxt, fcache->subcontext);
+ }
+
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 26c4c5e10fd..bf937364716 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -458,4 +458,34 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index cc7eb4da4d3..71320799040 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -272,4 +272,31 @@ explain (verbose, costs off) execute inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.1K, 6-v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From d9d95e09961dcb8236e5fe7b2da4a37fda8e5944 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v3 3/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..751590adcc9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 187a480e508..3b450e3373f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1872,6 +1872,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v3-0001-Refactor-partition-pruning-initialization-for-cla.patch (7.7K, 7-v3-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 243d407de86b0a73b9bd8c8dbc541f630eb33747 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v3 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
2 files changed, 43 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aa12e9ad2ea..88b150c8d77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1772,6 +1771,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1796,6 +1798,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1803,11 +1828,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1825,20 +1850,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1846,8 +1864,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1965,14 +1981,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2206,8 +2220,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2219,8 +2233,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-11-23 12:17 ` Tender Wang <[email protected]>
2025-11-25 01:56 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Tender Wang @ 2025-11-23 12:17 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tom Lane <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> 于2025年11月20日周四 15:30写道:
> On Mon, Nov 17, 2025 at 9:50 PM Amit Langote <[email protected]>
> wrote:
> > On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]>
> wrote:
> > > * Enable pruning-aware locking in cached / generic plan reuse (0004):
> > > extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
> > > on each PlannedStmt in the CachedPlan, locking only surviving
> > > partitions. Adds CachedPlanPrepData to pass this through plan cache
> > > APIs and down to execution via QueryDesc. Also reinstates the
> > > firstResultRel locking rule added in 28317de72 but later lost due to
> > > revert of the earlier pruning patch, to ensure correctness when all
> > > target partitions are pruned.
> >
> > Looking at the changes to executor/function.c, I also noticed that I
> > had mistakenly allocated the ExecutorPrep state in
> > SQLFunctionCache.fcontext whereas the correct context for execution
> > related state is SQLFunctionCache.subcontext. In the updated patch,
> > I've made postquel_start() reparent the prep EState's es_query_cxt to
> > subcontext from fcontext. I also did not have a test case that
> > exercised cached plan reuse for SQL functions, so I added one. I split
> > the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
> > new patch 0005 so it can be reviewed separately, since it is the only
> > non-mechanical call-site change.
>
> I also noticed a bug in the prep cleanup logic that runs when a cached
> plan becomes invalid during the prep phase. Patch 0005 fixes that and
> adds a regression test that exercises the invalidation path. This will
> be folded into 0004 later.
>
I spent time looking at these patches.
I search all places that call GetCachedPlan(), and we always pass
&cprep(CachedPlanPrepData) to GetCachedPlan().
In PrepAndCheckCachedPlan(), if the plan_cache_mode is force_generic_plan,
the LockPolicy is always LOCK_UNPRUNED. Because *cprep has never been NULL.
It seems that the LockPolicy has no chance to be LOCK_ALL. Do I miss
something here?
--
Thanks,
Tender Wang
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-23 12:17 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
@ 2025-11-25 01:56 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2025-11-25 01:56 UTC (permalink / raw)
To: Tender Wang <[email protected]>; +Cc: Tom Lane <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Sun, Nov 23, 2025 at 9:17 PM Tender Wang <[email protected]> wrote:
> Amit Langote <[email protected]> 于2025年11月20日周四 15:30写道:
>>
>> On Mon, Nov 17, 2025 at 9:50 PM Amit Langote <[email protected]> wrote:
>> > On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]> wrote:
>> > > * Enable pruning-aware locking in cached / generic plan reuse (0004):
>> > > extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
>> > > on each PlannedStmt in the CachedPlan, locking only surviving
>> > > partitions. Adds CachedPlanPrepData to pass this through plan cache
>> > > APIs and down to execution via QueryDesc. Also reinstates the
>> > > firstResultRel locking rule added in 28317de72 but later lost due to
>> > > revert of the earlier pruning patch, to ensure correctness when all
>> > > target partitions are pruned.
>> >
>> > Looking at the changes to executor/function.c, I also noticed that I
>> > had mistakenly allocated the ExecutorPrep state in
>> > SQLFunctionCache.fcontext whereas the correct context for execution
>> > related state is SQLFunctionCache.subcontext. In the updated patch,
>> > I've made postquel_start() reparent the prep EState's es_query_cxt to
>> > subcontext from fcontext. I also did not have a test case that
>> > exercised cached plan reuse for SQL functions, so I added one. I split
>> > the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
>> > new patch 0005 so it can be reviewed separately, since it is the only
>> > non-mechanical call-site change.
>>
>> I also noticed a bug in the prep cleanup logic that runs when a cached
>> plan becomes invalid during the prep phase. Patch 0005 fixes that and
>> adds a regression test that exercises the invalidation path. This will
>> be folded into 0004 later.
>
> I spent time looking at these patches.
>
> I search all places that call GetCachedPlan(), and we always pass &cprep(CachedPlanPrepData) to GetCachedPlan().
> In PrepAndCheckCachedPlan(), if the plan_cache_mode is force_generic_plan, the LockPolicy is always LOCK_UNPRUNED. Because *cprep has never been NULL.
> It seems that the LockPolicy has no chance to be LOCK_ALL. Do I miss something here?
Yes, eventually LockPolicy may end up redundant and we might not need
AcquireExecutorLocksPolicy() at all, with a single locking path
covering both cases.
My goal initially was to stage the changes across call sites: keep a
LOCK_ALL path for callers that still use the old lock everything up
front behaviour, and gradually convert other callers to pass a
non-NULL CachedPlanPrepData and handle the prep_list it may return, so
that GetCachedPlan() can perform LOCK_UNPRUNED locking internally.
That is why GetCachedPlan() accepts a possibly NULL cprep and why
LockPolicy exists as a separate knob.
For example, I decided to split out function.c refactoring of plan
cache usage into its own patch. That made me realise that new users of
GetCachedPlan() may appear that first adopt the simpler LOCK_ALL
behaviour and only later switch to UNPRUNED when pruning aware locking
becomes useful for them. Keeping the two paths preserves that
incremental route and avoids forcing every new user to adopt
CachedPlanPrepData and UNPRUNED locking up front. I am undecided yet
if that two path structure is a good idea, but I am inclined to keep
it for now. I would be happy to hear opinions on this.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-11-24 03:29 ` Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Chao Li @ 2025-11-24 03:29 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi, Amit,
Locking only surviving partitions sounds a good optimization. I started to review this patch, but I cannot finish reviewing in one day. I will post my comments as long as I finished some commits.
> On Nov 20, 2025, at 15:30, Amit Langote <[email protected]> wrote:
>
> <v3-0004-Use-pruning-aware-locking-in-cached-plans.patch><v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch><v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch><v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch><v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch><v3-0001-Refactor-partition-pruning-initialization-for-cla.patch>
0001 splits creations of es_part_prune_states into a new function ExecCreatePartitionPruneStates(). With that, you are trying to make the code clearer as you stated in the commit comment. However, the new function is not called, meaning 0001 is not self-contained, feels unusual to me according to the patches I have reviewed so far. I would suggest have ExecDoInitialPruning() call ExecCreatePartitionPruneStates() when es_part_prune_states is still NIL., so that current logic is unchanged, and 0001 can be pushed independently.
0002 moves check permission etc logic from InitPlan() to the new function ExecutorPrep(). The commit message says “executor setup logic unchanged”. Because in old code, before permission check, there was no PushActiveSnapshot(), but in the patch, before check permission, PushActiveSnapshot() is done, which may introduce different behavior, I just wonder why PushActiveSnapshot() is added?
Actually, I am still trying to understand 0002-0004, it would take me some time to fully understand the patch. I’d raise the above comments first. I will continue reviewing this patch tomorrow.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
@ 2025-11-25 08:31 ` Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-11-25 08:31 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi Evan,
On Mon, Nov 24, 2025 at 12:30 PM Chao Li <[email protected]> wrote:
>
> Hi, Amit,
>
> Locking only surviving partitions sounds a good optimization. I started to review this patch, but I cannot finish reviewing in one day. I will post my comments as long as I finished some commits.
Thank you very much for taking the time to review.
> > On Nov 20, 2025, at 15:30, Amit Langote <[email protected]> wrote:
> >
> > <v3-0004-Use-pruning-aware-locking-in-cached-plans.patch><v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch><v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch><v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch><v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch><v3-0001-Refactor-partition-pruning-initialization-for-cla.patch>
>
>
> 0001 splits creations of es_part_prune_states into a new function ExecCreatePartitionPruneStates(). With that, you are trying to make the code clearer as you stated in the commit comment. However, the new function is not called, meaning 0001 is not self-contained, feels unusual to me according to the patches I have reviewed so far.
Oops, that is not intentional.
> I would suggest have ExecDoInitialPruning() call ExecCreatePartitionPruneStates() when es_part_prune_states is still NIL., so that current logic is unchanged, and 0001 can be pushed independently.
0002 adds a call to ExecDoInitialPruning() in ExecutorPrep(), preceded
by a call to ExecCreatePartitionPruneStates(), and that is how I think
it should be. So in the attached updated 0001, I have made InitPlan()
call ExecCreatePartitionPruneStates() before calling
ExecDoInitialPruning().
> 0002 moves check permission etc logic from InitPlan() to the new function ExecutorPrep(). The commit message says “executor setup logic unchanged”. Because in old code, before permission check, there was no PushActiveSnapshot(), but in the patch, before check permission, PushActiveSnapshot() is done, which may introduce different behavior, I just wonder why PushActiveSnapshot() is added?
That is a valid concern.
I found it necessary because the initial pruning code (which runs in
ExecDoInitialPruning()) may require ActiveSnapshot to be valid if
pruning expressions end up calling code that invokes
EnsurePortalSnapshotExists(). That requirement already existed when
ExecDoInitialPruning() was driven from ExecutorStart(), but
ExecutorPrep() can now be called from places that do not otherwise
push a snapshot. The snapshot push is only there to cover those
callers. It does not change permission checking itself, it just
ensures ExecutorPrep() runs with the same preconditions that
ExecutorStart() always had.
> Actually, I am still trying to understand 0002-0004, it would take me some time to fully understand the patch. I’d raise the above comments first. I will continue reviewing this patch tomorrow.
Thanks, I appreciate your review.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v4-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (28.8K, 2-v4-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From a004aab1ce9418a2f6273d1a67673b3d4a7c218b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v4 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present.
ExecutorStart() now expects QueryDesc->prep to point at such an
ExecPrep object. If no prep was supplied by the caller, it
invokes ExecutorPrep() itself and adopts the resulting EState
for the duration of the query. This keeps the executor startup
behaviour unchanged while making the setup work callable
separately when needed.
CreateQueryDesc() grows a prep argument and stores it in the
QueryDesc. Portals, SPI, SQL functions, and EXPLAIN are wired
to carry an optional ExecPrep pointer alongside the PlannedStmt
list, but most callers still pass NULL and let ExecutorStart()
perform the setup lazily.
Add the ExecPrep struct and ExecPrepCleanup() to encapsulate
ownership of the prepared EState and any caller specific
cleanup hook. Update executor/README and related comments to
document the new control flow and the separation between
preparation and execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 8 +-
src/backend/executor/execMain.c | 180 ++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 11 ++
src/include/nodes/execnodes.h | 48 +++++++
src/include/utils/portal.h | 2 +
21 files changed, 286 insertions(+), 41 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index ebc204c4462..9429fc2d17d 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..95b5ec58c55 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,16 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Performs range
+ table initialization, permission checks, and initial partition pruning.
+ Returns an ExecPrep wrapper with EState that ExecutorStart may reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f5f4986383d..39de0b93a1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -171,8 +171,26 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep == NULL)
+ queryDesc->prep = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ Assert(queryDesc->prep);
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +281,136 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later.
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ bool snapshot_set;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Pruning may use expressions that require an active snapshot. */
+ snapshot_set = false;
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snapshot_set = true;
+ }
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ /* Release snapshot if we got one */
+ if (snapshot_set)
+ PopActiveSnapshot();
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +972,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,38 +985,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->prep);
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 61559642662..ac5e2ebee72 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2369,6 +2369,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..5880a574a06 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..3579926d4e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,16 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..8bdecd631bf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,54 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * ExecutorPrep() factors out executor setup steps such as initializing the
+ * range table, checking permissions, and executing initial partition pruning.
+ * ExecutorStart() can reuse the prepared EState instead of repeating that
+ * work, and other callers (such as plan cache validation) can use it without
+ * running the full plan.
+ */
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(void *prep);
+
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v4-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.1K, 3-v4-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From 695b2d630d1e0812de9e3d227a56fadf21a8b61a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v4 3/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..751590adcc9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ac5e2ebee72..dc4eac8a0a7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1873,6 +1873,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v4-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (6.7K, 4-v4-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 5dc90ce54c7108d5335003da4f247a65803e42e7 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 17 Nov 2025 17:40:26 +0900
Subject: [PATCH v4 6/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users, which a later
patch will use to support pruning aware locking.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 32 +++++++++++++++++++++++--
src/test/regress/expected/plancache.out | 30 +++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 27 +++++++++++++++++++++
3 files changed, 87 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index d81718ea84e..ed7352fce61 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i++) :
+ NULL;
execution_state *newes;
/*
@@ -764,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,8 +1378,20 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ if (es->prep)
+ {
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ EState *prep_estate = es->prep->prep_estate;
+
+ MemoryContextSetParent(prep_estate->es_query_cxt, fcache->subcontext);
+ }
+
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 26c4c5e10fd..bf937364716 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -458,4 +458,34 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index cc7eb4da4d3..71320799040 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -272,4 +272,31 @@ explain (verbose, costs off) execute inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v4-0004-Use-pruning-aware-locking-in-cached-plans.patch (24.5K, 5-v4-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From f3c07bcc5a14a0b751d82771c97c95775cea2758 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v4 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 26 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 234 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 ++
src/include/utils/plancache.h | 24 ++-
11 files changed, 313 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..23332d19b37 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..d81718ea84e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e44f1223886..7de2328021b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4671,8 +4671,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..d580f1e0425 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2588,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5880a574a06..a96419edcbe 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2039,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..c1cfd47422c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting ExecPrep pointers are appended to
+ * cprep->prep_list in cprep->context. On release, the same ExecPrep
+ * list is consulted to determine which relations to unlock and is then
+ * cleaned up with ExecPrepCleanup().
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2022,158 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the ExecPrep pointer for each PlannedStmt to cprep->prep_list
+ *
+ * On release, it:
+ * - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - calls ExecPrepCleanup() on each ExecPrep
+ *
+ * prep_list is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for ExecPrep happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 46a8655621d..5af4c31f53a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..eb211f1ba56 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..c7b8ec4be39 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,27 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_list is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. The
+ * same list is later used to determine which relations to unlock when
+ * releasing execution locks.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* one ExecPrep per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate ExecPrep objects */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to pass to ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +261,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v4-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch (9.3K, 6-v4-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch)
download | inline diff:
From 774853b8d3c0f8d4ee1afc8329526e7d22987cab Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 20 Nov 2025 15:35:47 +0900
Subject: [PATCH v4 5/6] Add test exercising prep cleanup on cached-plan
invalidation
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/utils/cache/plancache.c | 38 +++++++++++++--
src/test/regress/expected/plancache.out | 61 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 50 ++++++++++++++++++++
3 files changed, 144 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index c1cfd47422c..a9a4e11d1a5 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -103,6 +103,7 @@ static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -1033,6 +1034,9 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
/* Oops, the race case happened. Release useless locks. */
AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -2069,7 +2073,6 @@ LockRelids(List *rtable, Bitmapset *relids, bool acquire)
* - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
* (which must already be populated)
* - unlocks the same relations identified during acquire
- * - calls ExecPrepCleanup() on each ExecPrep
*
* prep_list is extended during acquire and must match stmt_list one-to-one
* when releasing locks. Memory allocation for ExecPrep happens in
@@ -2165,15 +2168,40 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
LockRelids(plannedstmt->rtable, lock_relids, acquire);
bms_free(lock_relids);
}
-
- /* Clean up prep if releasing locks. */
- if (!acquire)
- ExecPrepCleanup(prep);
}
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * CachedPlanPrepCleanup
+ * Clean up ExecPrep state built for a generic plan.
+ *
+ * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+
+ if (cprep == NULL)
+ return;
+
+ foreach(lc, cprep->prep_list)
+ {
+ ExecPrep *prep = (ExecPrep *) lfirst(lc);
+
+ ExecPrepCleanup(prep);
+ }
+
+ list_free(cprep->prep_list);
+ cprep->prep_list = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..26c4c5e10fd 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,64 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..cc7eb4da4d3 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,53 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v4-0001-Refactor-partition-pruning-initialization-for-cla.patch (8.2K, 7-v4-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 2d7e972bf0e772b55674d6c390682777dc8c99a3 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v4 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
3 files changed, 44 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..f5f4986383d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -868,6 +868,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0dcce181f09..61559642662 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1773,6 +1772,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1797,6 +1799,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1804,11 +1829,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1826,20 +1851,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1847,8 +1865,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1966,14 +1982,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2207,8 +2221,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2220,8 +2234,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-02-11 04:05 ` Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-02-11 04:05 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi,
Here is v5 of this patch series.
Changes since v4:
* Removed the ExecPrep struct. ExecutorPrep() now returns an EState
directly, which is carried through QueryDesc->estate and prep_estates
lists. This simplifies the interface and avoids the ownership tracking
(owns_estate) that ExecPrep required.
* Moved the PARAM_EXEC values setup from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts().
This allows ExecutorPrep() to create pruning state before the executor
has set up PARAM_EXEC slots.
* Added a standalone test (0003) that inspects pg_locks to show
baseline locking behavior: currently both pruning-on and pruning-off
lock all child partitions. The pruning-aware locking patch (0004) then
updates the expected output, making the behavioral change visible as a
diff.
* Absorbed the plan invalidation test (v4's 0005) into the locking
patch (0004), since that's where CachedPlanPrepCleanup lives.
* Moved the parallel worker reuse patch to the end of the series
(0006), since the core functionality doesn't depend on it.
* Fixed the missing i++ in CheckInitialPruningResultsInWorker(),
renamed it from ExecCheckInitialPruningResults(), and made it a
debug-only check (#ifdef USE_ASSERT_CHECKING) that verifies the
worker's pruning result matches the leader's.
The series is:
0001: Refactor partition pruning initialization for clarity
0002: Introduce ExecutorPrep and refactor executor startup
0003: Add test for partition lock behavior with generic cached plans
0004: Use pruning-aware locking in cached plans
0005: Make SQL function executor track ExecutorPrep state
0006: Reuse partition pruning results in parallel workers
The core of this series is ExecutorPrep() in 0002 and its use by
GetCachedPlan() in 0004. This is where I'd appreciate the most
scrutiny.
ExecutorPrep() factors out permission checks, range table
initialization, and initial partition pruning from InitPlan() into a
callable helper that can run before ExecutorStart(). This is what
makes pruning-aware locking possible: GetCachedPlan() calls
ExecutorPrep() during plan validation to determine which partitions
survive initial pruning, then locks only those (plus firstResultRel
targets). The resulting EStates are passed through CachedPlanPrepData
and eventually adopted by ExecutorStart(), so the pruning work isn't
repeated.
The risk is that this pulls part of what we traditionally consider
"execution" into a phase that runs during plan cache validation.
Specifically:
* Snapshot use: ExecutorPrep() requires an active snapshot for pruning
expressions that may call PL functions. During plan cache validation,
we use a transaction snapshot rather than the portal's execution
snapshot. I believe this is safe because initial pruning has always
used whatever snapshot happens to be active at the time
ExecDoInitialPruning() runs, but it's a subtle change in when that
happens.
* EState memory lifetime: The EState created by ExecutorPrep() lives
longer than it used to, since it's created during GetCachedPlan() but
consumed later during ExecutorStart(). CachedPlanPrepData manages
this: it tracks the EStates and their resource owner, and
CachedPlanPrepCleanup() frees them if the plan is invalidated before
execution reaches ExecutorStart(). I've tried to ensure that the
EState's memory context hierarchy stays consistent with the current
system through the reparenting done in portal/SPI/function paths.
* Plan invalidation: If the plan is invalidated between ExecutorPrep()
and ExecutorStart() (e.g., DDL on a partition during pruning), the
prep state is discarded by CachedPlanPrepCleanup() and GetCachedPlan()
retries. The invalidation regression test in 0004 exercises this path.
As of 0002, ExecutorStart() is the only caller of ExecutorPrep(), so
there is no behavioral change until 0004 wires it into
GetCachedPlan(). This makes it possible to review the refactoring in
isolation.
I would really appreciate review from folks familiar with the executor
lifecycle and plan cache internals. The approach touches the boundary
between plan caching and execution, and getting the lifetime and
snapshot semantics right is important.
A note on 0006: it is intentionally last and independently droppable.
It passes the leader's pruning results to parallel workers via
nodeToString/stringToNode, which is the simplest approach but adds
serialization overhead to all parallel queries regardless of whe
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v5-0001-Refactor-partition-pruning-initialization-for-cla.patch (10.2K, 2-v5-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 1c54a46145f776db912cab83af5112541bccf703 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 15:08:52 +0900
Subject: [PATCH v5 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called at a time when the PARAM_EXEC
parameters have not yet been set up.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 103 +++++++++++++++++++--------
src/include/executor/execPartition.h | 1 +
3 files changed, 74 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..654f9246ad0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -868,6 +868,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d13e786cf13..7cdd7d45c6a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,8 +184,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1920,6 +1919,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1944,6 +1946,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1951,11 +1976,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1973,20 +1998,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1994,8 +2012,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2113,14 +2129,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2354,8 +2368,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2367,10 +2381,29 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
+ }
+#endif
+ }
j++;
}
@@ -2467,9 +2500,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2497,9 +2531,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
[application/octet-stream] v5-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (25.9K, 3-v5-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From e5898351861518fba28f1b0f7ff6ea8a1f5a94bb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v5 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep() builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present, and returns it
directly to the caller.
ExecutorStart() now checks if QueryDesc->estate is already set
(indicating ExecutorPrep() was called earlier). If so, it reuses
the EState to avoid redoing range table setup and pruning.
Otherwise, it invokes ExecutorPrep() itself and adopts the
resulting EState for the duration of the query. This keeps the
executor startup behavior unchanged while making the setup work
callable separately when needed.
CreateQueryDesc() grows a prep_estate argument to accept an
optionally pre-created EState and stores it in the QueryDesc.
Portals, SPI, SQL functions, and EXPLAIN are wired to carry
optional EState pointers alongside the PlannedStmt list, but most
callers still pass NULL and let ExecutorStart() perform the setup
lazily.
ExecutorPrep() requires the caller to have established an active
snapshot, as partition pruning expressions may call PL functions
that internally require one (e.g., via EnsurePortalSnapshotExists()).
Update executor/README and related comments to document the new
control flow and the separation between preparation and execution.
Note that as of this commit, ExecutorStart() is the only caller of
ExecutorPrep(), so there is no semantic change in behavior. Later
commits will add specialized callers that invoke ExecutorPrep()
earlier to enable pruning-aware locking in cached plans.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 ++-
src/backend/executor/execMain.c | 123 +++++++++++++++++++++-------
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 ++++--
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 ++++++
src/include/nodes/execnodes.h | 1 -
src/include/utils/portal.h | 2 +
20 files changed, 188 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..ef1ee2568c6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -875,7 +875,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b7bb111688c..a5db3ed788e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -550,7 +551,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 596105ee078..5743caa0506 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1043,7 +1043,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 5b86a727587..005fbb48aa5 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,7 +576,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -650,14 +653,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 654f9246ad0..8c50f45a5c5 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -145,7 +145,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -171,9 +170,19 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ */
+ if (queryDesc->estate == NULL)
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +272,84 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an EState that can be reused later.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -838,38 +925,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f87978c137e..2d3c5d6123e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1299,7 +1299,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 4ca342a43ef..c93e2664cfd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1368,7 +1368,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3019a3b2b97..994a69a1c8e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2499,6 +2500,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2577,6 +2580,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2614,9 +2618,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2694,7 +2700,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 02e9aaa6bca..5541c574c8b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1230,6 +1230,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2029,6 +2030,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index c1a53e658cb..941e95010c3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -297,6 +298,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 86226f8db70..3756a11345f 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 55a7d930d26..f7f922bfaa3 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc. Safe when
+ * prep_estates is NIL; just pass list_head(NIL) which is NULL.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f8053d9e572..70acfe3ad90 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -774,7 +774,6 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
-
/*
* ExecRowMark -
* runtime representation of FOR [KEY] UPDATE/SHARE clauses
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v5-0003-Add-test-for-partition-lock-behavior-with-generic.patch (5.3K, 4-v5-0003-Add-test-for-partition-lock-behavior-with-generic.patch)
download | inline diff:
From 0a2e08b85986b14a26cf6226c6fb1e5094b3a173 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:00:32 +0900
Subject: [PATCH v5 3/6] Add test for partition lock behavior with generic
cached plans
Add a regression test that inspects pg_locks to verify which child
partitions are locked when executing a prepared statement that uses
a generic cached plan.
Two cases are tested: one with enable_partition_pruning on and one
with it off. Currently both cases lock all child partitions, because
GetCachedPlan() acquires execution locks on every relation in the
plan regardless of pruning.
A subsequent commit that adds pruning-aware locking will update the
expected output for the pruning-enabled case, showing that only the
surviving partition is locked.
---
src/test/regress/expected/partition_prune.out | 83 +++++++++++++++++++
src/test/regress/sql/partition_prune.sql | 55 ++++++++++++
2 files changed, 138 insertions(+)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..39dab8fcc05 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,86 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..229c5eb370c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,58 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
--
2.47.3
[application/octet-stream] v5-0004-Use-pruning-aware-locking-in-cached-plans.patch (37.6K, 5-v5-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From e3da29f24c92a82a2d38929c1aa96fe72cc98a0b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v5 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 26 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 20 ++
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 292 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 27 +-
src/test/regress/expected/partition_prune.out | 50 ++-
src/test/regress/expected/plancache.out | 62 ++++
src/test/regress/sql/partition_prune.sql | 24 +-
src/test/regress/sql/plancache.sql | 51 +++
15 files changed, 574 insertions(+), 26 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 005fbb48aa5..e8cd47131ce 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c93e2664cfd..65dfae58dcf 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f5e9d369940..fc7ff46f86a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4664,8 +4664,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4679,6 +4679,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 994a69a1c8e..13703969dd8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2502,6 +2512,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2576,11 +2587,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 757bdc7b1de..10470297bdb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -655,6 +655,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 16d200cfb46..d20a66e3e37 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -381,6 +381,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. There is one ModifyTable
+ * per query level, so this captures exactly one entry per level.
+ * ExecInitModifyTable() asserts that the recorded index matches
+ * what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5541c574c8b..b749b9c8d1a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1636,6 +1636,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2017,7 +2018,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2030,7 +2035,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 37d5d73b7fb..305fe912586 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,9 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +140,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +961,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +975,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1003,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1033,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1283,6 +1316,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the EStates in
+ * cprep->prep_estates. These are intended to be passed later to
+ * ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1331,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1354,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1901,6 +1942,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting EStates are appended to
+ * cprep->prep_estates in cprep->context. On release, the same EState
+ * list is consulted to determine which relations to unlock and each
+ * EState is released.
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1953,6 +2026,211 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ if (!(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
+ elog(ERROR, "LockRelids(): cannot lock relation at RT index %d",
+ rtindex);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - cleans up each EState
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+
+ if (cprep == NULL)
+ return;
+
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index fb808823acf..653bd46ce05 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -214,6 +214,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4bc6fb5670e..9e6106751cb 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 984c51515c6..766a11d92a0 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,30 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. The
+ * same list is later used to determine which relations to unlock when
+ * releasing execution locks.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags should be set properly if it affects initial pruning, for example,
+ * if running EXPLAIN (GENERIC_PLAN).
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +264,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 39dab8fcc05..39770f3b6d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4860,9 +4860,7 @@ select c.relname
relname
--------------
prunelock_p1
- prunelock_p2
- prunelock_p3
-(3 rows)
+(1 row)
commit;
deallocate prunelock_q;
@@ -4904,6 +4902,50 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 229c5eb370c..87672ad40f7 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1499,6 +1499,28 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v5-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 6-v5-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From ebe3d97df5366f7bf741962f440fb34d5dc3d16a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v5 5/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 65dfae58dcf..c70e06d8886 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -764,6 +778,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,6 +1377,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1370,7 +1394,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1461,6 +1485,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v5-0006-Reuse-partition-pruning-results-in-parallel-worke.patch (8.2K, 7-v5-0006-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From 921114fc472b80a0ecb59c98282642412a6ce31c Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v5 6/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute. This
check helps catch inconsistencies across leader and worker pruning
logic.
---
src/backend/executor/execParallel.c | 108 +++++++++++++++++++++++++++-
1 file changed, 107 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 2d3c5d6123e..0eb28cbaa1e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -66,6 +67,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -140,6 +143,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -619,12 +624,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -653,6 +664,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -679,6 +692,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -780,6 +803,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1279,10 +1312,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1295,12 +1333,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-03-07 09:54 ` Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-03-07 09:54 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi,
Attached is v6 of the patch series. I've been working toward
committing this, so I wanted to lay out the ExecutorPrep() design and
the key trade-offs before doing so.
When a cached generic plan references a partitioned table,
GetCachedPlan() locks all partitions upfront via
AcquireExecutorLocks(), even those that initial pruning will
eliminate. But initial partition pruning only runs later during
ExecutorStart(). Moving pruning earlier requires some executor setup
(range table, permissions, pruning state), and ExecutorPrep() is the
vehicle for that. Unlike the approach reverted in last May, this
keeps the CachedPlan itself unchanged -- all per-execution state flows
through a separate CachedPlanPrepData that the caller provides.
The approach also keeps GetCachedPlan()'s interface
backward-compatible: the new CachedPlanPrepData argument is optional.
If a caller passes NULL, all partitions are locked as before and
nothing changes. This means existing callers and any new code that
calls GetCachedPlan() without caring about pruning-aware locking just
works.
The risk is on the other side: if a caller does pass a
CachedPlanPrepData, GetCachedPlan() will lock only the surviving
partitions and populate prep_estates with the EStates that
ExecutorPrep() created. The caller then must make those EStates
available to ExecutorStart() -- via QueryDesc->estate,
portal->prep_estates, or the equivalent path for SPI and SQL
functions. If it fails to do so, ExecutorStart() will call
ExecutorPrep() again, which may compute different pruning results than
the original call, potentially expecting locks on relations that were
never acquired. The executor would then operate on relations it
doesn't hold locks on.
So the contract is: if you opt in to pruning-aware locking by passing
CachedPlanPrepData, you must complete the pipeline by delivering the
prep EStates to the executor. In the current patch, all the call sites
that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
EXPLAIN) do thread the EStates through correctly, and I've tried to
make the plumbing straightforward enough that it's hard to get wrong.
But it is a new invariant that didn't exist before, and a caller that
gets it wrong would fail silently rather than with an obvious error.
To catch such violations, I've added a debug-only check in
standard_ExecutorStart() that fires when no prep EState was provided.
It iterates over the plan's rtable and verifies that every lockable
relation is actually locked. It should always be true if
AcquireExecutorLocks() locked everything, but would fail if
pruning-aware locking happened upstream and the caller dropped the
prep EState. The check is skipped in parallel workers, which acquire
relation locks lazily in ExecGetRangeTableRelation().
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex =
bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
So the invariant is: if no prep EState was provided, every relation in
the plan is locked; if one was provided, at least the unpruned
relations are locked. Both are checked in assert builds.
I think this covers the main concerns, but I may be missing something.
If anyone sees a problem with this approach, I'd like to hear about
it.
--
Thanks,
Amit Langote
Attachments:
[application/octet-stream] v6-0004-Use-pruning-aware-locking-in-cached-plans.patch (37.7K, 2-v6-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 800949bf7a327a7b8bfc5b9fbcdbf0ac39106056 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v6 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 26 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 20 ++
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 292 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 29 +-
src/test/regress/expected/partition_prune.out | 50 ++-
src/test/regress/expected/plancache.out | 62 ++++
src/test/regress/sql/partition_prune.sql | 24 +-
src/test/regress/sql/plancache.sql | 51 +++
15 files changed, 576 insertions(+), 26 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 005fbb48aa5..e8cd47131ce 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c93e2664cfd..65dfae58dcf 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..a7a4baaf8af 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4858,8 +4858,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4873,6 +4873,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 994a69a1c8e..13703969dd8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2502,6 +2512,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2576,11 +2587,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..ddb7902bc89 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. There is one ModifyTable
+ * per query level, so this captures exactly one entry per level.
+ * ExecInitModifyTable() asserts that the recorded index matches
+ * what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cd1e429ceed..5c145a31274 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1636,6 +1636,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2017,7 +2018,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2030,7 +2035,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 812e2265734..be2a961a918 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,9 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -139,6 +142,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -940,7 +963,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -949,7 +977,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -977,13 +1005,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1035,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1318,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the EStates in
+ * cprep->prep_estates. These are intended to be passed later to
+ * ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1333,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1356,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1903,6 +1944,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting EStates are appended to
+ * cprep->prep_estates in cprep->context. On release, the same EState
+ * list is consulted to determine which relations to unlock and each
+ * EState is released.
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1955,6 +2028,211 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ if (!(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
+ elog(ERROR, "LockRelids(): cannot lock relation at RT index %d",
+ rtindex);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - cleans up each EState
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+
+ if (cprep == NULL)
+ return;
+
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c175ee95b68..989b3c73691 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8c9321aab8c..1431f12a6e8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 984c51515c6..da3ce9f3177 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,32 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +266,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 39dab8fcc05..39770f3b6d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4860,9 +4860,7 @@ select c.relname
relname
--------------
prunelock_p1
- prunelock_p2
- prunelock_p3
-(3 rows)
+(1 row)
commit;
deallocate prunelock_q;
@@ -4904,6 +4902,50 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 229c5eb370c..87672ad40f7 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1499,6 +1499,28 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v6-0003-Add-test-for-partition-lock-behavior-with-generic.patch (5.3K, 3-v6-0003-Add-test-for-partition-lock-behavior-with-generic.patch)
download | inline diff:
From 58179bd0d3730dbd1fdbb0bd9c624dc7ae770830 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:00:32 +0900
Subject: [PATCH v6 3/6] Add test for partition lock behavior with generic
cached plans
Add a regression test that inspects pg_locks to verify which child
partitions are locked when executing a prepared statement that uses
a generic cached plan.
Two cases are tested: one with enable_partition_pruning on and one
with it off. Currently both cases lock all child partitions, because
GetCachedPlan() acquires execution locks on every relation in the
plan regardless of pruning.
A subsequent commit that adds pruning-aware locking will update the
expected output for the pruning-enabled case, showing that only the
surviving partition is locked.
---
src/test/regress/expected/partition_prune.out | 83 +++++++++++++++++++
src/test/regress/sql/partition_prune.sql | 55 ++++++++++++
2 files changed, 138 insertions(+)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..39dab8fcc05 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,86 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..229c5eb370c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,58 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
--
2.47.3
[application/octet-stream] v6-0006-Reuse-partition-pruning-results-in-parallel-worke.patch (15.9K, 4-v6-0006-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From dc2cfc32410792b3f00422c07623f989901ee34b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v6 6/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute. This
check helps catch inconsistencies across leader and worker pruning
logic.
---
src/backend/executor/execParallel.c | 108 +++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 95 +++++++-----------------
2 files changed, 133 insertions(+), 70 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..d337bf8c081 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index be2a961a918..1d3244307da 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,14 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
CachedPlanPrepData *cprep);
static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
@@ -142,26 +142,6 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
-/*
- * Lock acquisition policy for execution locks.
- *
- * LOCK_ALL acquires locks on all relations mentioned in the plan,
- * reproducing the behavior of AcquireExecutorLocks().
- *
- * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
- * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
- * partitions remaining after performing initial pruning.
- */
-typedef enum LockPolicy
-{
- LOCK_ALL,
- LOCK_UNPRUNED,
-} LockPolicy;
-
-static void AcquireExecutorLocksWithPolicy(List *stmt_list,
- LockPolicy policy, bool acquire,
- CachedPlanPrepData *cprep);
-
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -963,7 +943,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
* If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
* compute the set of partitions that survive initial runtime pruning in order
@@ -977,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -1005,15 +985,16 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
*/
if (plan->is_valid)
{
- LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
-
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1035,7 +1016,10 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
/* Also clean up ExecutorPrep() state, if necessary. */
CachedPlanPrepCleanup(cprep);
@@ -1358,7 +1342,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (cprep)
cprep->params = boundParams;
- if (PrepAndCheckCachedPlan(plansource, cprep))
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1945,43 +1929,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocksWithPolicy
- * Acquire or release execution locks for a cached plan according to
- * the specified policy.
- *
- * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
- * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
- * unprunable rels and partitions that survive initial runtime pruning.
- *
- * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
- * each PlannedStmt and the resulting EStates are appended to
- * cprep->prep_estates in cprep->context. On release, the same EState
- * list is consulted to determine which relations to unlock and each
- * EState is released.
- */
-static void
-AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
- CachedPlanPrepData *cprep)
-{
- switch (policy)
- {
- case LOCK_ALL:
- AcquireExecutorLocks(stmt_list, acquire);
- break;
- case LOCK_UNPRUNED:
- AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
- break;
- default:
- elog(ERROR, "invalid LockPolicy");
- }
-}
-
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -2044,10 +1998,8 @@ LockRelids(List *rtable, Bitmapset *relids, bool acquire)
{
RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- elog(ERROR, "LockRelids(): cannot lock relation at RT index %d",
- rtindex);
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
@@ -2204,7 +2156,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
* CachedPlanPrepCleanup
* Clean up EState built for a generic plan.
*
- * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * This is used in the corner case where CheckCachedPlan() discovers
* that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
* has already run. In that case we must both release the execution locks
* and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
@@ -2214,10 +2166,14 @@ static void
CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
{
ListCell *lc;
+ ResourceOwner oldowner;
if (cprep == NULL)
return;
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
foreach(lc, cprep->prep_estates)
{
EState *prep_estate = (EState *) lfirst(lc);
@@ -2228,6 +2184,7 @@ CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
ExecCloseRangeTableRelations(prep_estate);
FreeExecutorState(prep_estate);
}
+ CurrentResourceOwner = oldowner;
list_free(cprep->prep_estates);
cprep->prep_estates = NIL;
--
2.47.3
[application/octet-stream] v6-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 5-v6-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 836f0b63ced2546b594643043b7d0055ffaa7b66 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v6 5/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 65dfae58dcf..c70e06d8886 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -764,6 +778,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,6 +1377,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1370,7 +1394,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1461,6 +1485,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v6-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (27.6K, 6-v6-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From aeaaa5059a7be06c301b1372c16829225b2770fb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v6 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep() builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present, and returns it
directly to the caller.
ExecutorStart() now checks if QueryDesc->estate is already set
(indicating ExecutorPrep() was called earlier). If so, it reuses
the EState to avoid redoing range table setup and pruning.
Otherwise, it invokes ExecutorPrep() itself and adopts the
resulting EState for the duration of the query. This keeps the
executor startup behavior unchanged while making the setup work
callable separately when needed.
CreateQueryDesc() grows a prep_estate argument to accept an
optionally pre-created EState and stores it in the QueryDesc.
Portals, SPI, SQL functions, and EXPLAIN are wired to carry
optional EState pointers alongside the PlannedStmt list, but most
callers still pass NULL and let ExecutorStart() perform the setup
lazily.
ExecutorPrep() requires the caller to have established an active
snapshot, as partition pruning expressions may call PL functions
that internally require one (e.g., via EnsurePortalSnapshotExists()).
Update executor/README and related comments to document the new
control flow and the separation between preparation and execution.
Note that as of this commit, ExecutorStart() is the only caller of
ExecutorPrep(), so there is no semantic change in behavior. Later
commits will add specialized callers that invoke ExecutorPrep()
earlier to enable pruning-aware locking in cached plans.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 176 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 ++++
src/include/nodes/execnodes.h | 1 -
src/include/utils/portal.h | 2 +
20 files changed, 241 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..ef1ee2568c6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -875,7 +875,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 93918a223b8..40564d4dff9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -550,7 +551,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 963618a64c4..ff759ddd07c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1173,7 +1173,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 5b86a727587..005fbb48aa5 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,7 +576,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -650,14 +653,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 654f9246ad0..d7e99690c7f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,6 +55,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -145,7 +146,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -171,9 +171,71 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +325,84 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an EState that can be reused later.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -838,38 +978,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 4ca342a43ef..c93e2664cfd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1368,7 +1368,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3019a3b2b97..994a69a1c8e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2499,6 +2500,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2577,6 +2580,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2614,9 +2618,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2694,7 +2700,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d01a09dd0c4..cd1e429ceed 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1230,6 +1230,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2029,6 +2030,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index c1a53e658cb..941e95010c3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -297,6 +298,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 86226f8db70..3756a11345f 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..e6fa122e6e4 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc. Safe when
+ * prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..84d80e3ab0d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -775,7 +775,6 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
-
/*
* ExecRowMark -
* runtime representation of FOR [KEY] UPDATE/SHARE clauses
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v6-0001-Refactor-partition-pruning-initialization-for-cla.patch (10.2K, 7-v6-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 6f2c9cc7a30d38cb2606595f62b62c77e2aba6e9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 15:08:52 +0900
Subject: [PATCH v6 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called at a time when the PARAM_EXEC
parameters have not yet been set up.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 103 +++++++++++++++++++--------
src/include/executor/execPartition.h | 1 +
3 files changed, 74 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..654f9246ad0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -868,6 +868,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bab294f5e91..20c3513fabe 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,8 +184,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1942,6 +1941,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1966,6 +1968,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1973,11 +1998,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1995,20 +2020,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -2016,8 +2034,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2135,14 +2151,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2376,8 +2390,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2389,10 +2403,29 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
+ }
+#endif
+ }
j++;
}
@@ -2489,9 +2522,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2519,9 +2553,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-03-09 04:41 ` Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-03-09 04:41 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Sat, Mar 7, 2026 at 6:54 PM Amit Langote <[email protected]> wrote:
> Attached is v6 of the patch series. I've been working toward
> committing this, so I wanted to lay out the ExecutorPrep() design and
> the key trade-offs before doing so.
>
> When a cached generic plan references a partitioned table,
> GetCachedPlan() locks all partitions upfront via
> AcquireExecutorLocks(), even those that initial pruning will
> eliminate. But initial partition pruning only runs later during
> ExecutorStart(). Moving pruning earlier requires some executor setup
> (range table, permissions, pruning state), and ExecutorPrep() is the
> vehicle for that. Unlike the approach reverted in last May, this
> keeps the CachedPlan itself unchanged -- all per-execution state flows
> through a separate CachedPlanPrepData that the caller provides.
>
> The approach also keeps GetCachedPlan()'s interface
> backward-compatible: the new CachedPlanPrepData argument is optional.
> If a caller passes NULL, all partitions are locked as before and
> nothing changes. This means existing callers and any new code that
> calls GetCachedPlan() without caring about pruning-aware locking just
> works.
>
> The risk is on the other side: if a caller does pass a
> CachedPlanPrepData, GetCachedPlan() will lock only the surviving
> partitions and populate prep_estates with the EStates that
> ExecutorPrep() created. The caller then must make those EStates
> available to ExecutorStart() -- via QueryDesc->estate,
> portal->prep_estates, or the equivalent path for SPI and SQL
> functions. If it fails to do so, ExecutorStart() will call
> ExecutorPrep() again, which may compute different pruning results than
> the original call, potentially expecting locks on relations that were
> never acquired. The executor would then operate on relations it
> doesn't hold locks on.
>
> So the contract is: if you opt in to pruning-aware locking by passing
> CachedPlanPrepData, you must complete the pipeline by delivering the
> prep EStates to the executor. In the current patch, all the call sites
> that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
> EXPLAIN) do thread the EStates through correctly, and I've tried to
> make the plumbing straightforward enough that it's hard to get wrong.
> But it is a new invariant that didn't exist before, and a caller that
> gets it wrong would fail silently rather than with an obvious error.
>
> To catch such violations, I've added a debug-only check in
> standard_ExecutorStart() that fires when no prep EState was provided.
> It iterates over the plan's rtable and verifies that every lockable
> relation is actually locked. It should always be true if
> AcquireExecutorLocks() locked everything, but would fail if
> pruning-aware locking happened upstream and the caller dropped the
> prep EState. The check is skipped in parallel workers, which acquire
> relation locks lazily in ExecGetRangeTableRelation().
>
> + if (queryDesc->estate == NULL)
> + {
> +#ifdef USE_ASSERT_CHECKING
> + if (!IsParallelWorker())
> + {
> + ListCell *lc;
> +
> + foreach(lc, queryDesc->plannedstmt->rtable)
> + {
> + RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
> +
> + if (rte->rtekind == RTE_RELATION ||
> + (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
> + Assert(CheckRelationOidLockedByMe(rte->relid,
> + rte->rellockmode,
> + true));
> + }
> + }
> +#endif
> + queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
> + queryDesc->params,
> + CurrentResourceOwner,
> + true,
> + eflags);
> + }
> +#ifdef USE_ASSERT_CHECKING
> + else
> + {
> + /*
> + * A prep EState was provided, meaning pruning-aware locking
> + * should have locked at least the unpruned relations.
> + */
> + if (!IsParallelWorker())
> + {
> + int rtindex = -1;
> +
> + while ((rtindex =
> bms_next_member(queryDesc->estate->es_unpruned_relids,
> + rtindex)) >= 0)
> + {
> + RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
> +
> + Assert(rte->rtekind == RTE_RELATION ||
> + (rte->rtekind == RTE_SUBQUERY &&
> + rte->relid != InvalidOid));
> + Assert(CheckRelationOidLockedByMe(rte->relid,
> + rte->rellockmode, true));
> + }
> + }
> + }
> +#endif
>
> So the invariant is: if no prep EState was provided, every relation in
> the plan is locked; if one was provided, at least the unpruned
> relations are locked. Both are checked in assert builds.
>
> I think this covers the main concerns, but I may be missing something.
> If anyone sees a problem with this approach, I'd like to hear about
> it.
Here's v7. Some plancache.c changes that I'd made were in the wrong
patch in v6; this version puts them where they belong.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v7-0003-Add-test-for-partition-lock-behavior-with-generic.patch (5.3K, 2-v7-0003-Add-test-for-partition-lock-behavior-with-generic.patch)
download | inline diff:
From 58179bd0d3730dbd1fdbb0bd9c624dc7ae770830 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:00:32 +0900
Subject: [PATCH v7 3/6] Add test for partition lock behavior with generic
cached plans
Add a regression test that inspects pg_locks to verify which child
partitions are locked when executing a prepared statement that uses
a generic cached plan.
Two cases are tested: one with enable_partition_pruning on and one
with it off. Currently both cases lock all child partitions, because
GetCachedPlan() acquires execution locks on every relation in the
plan regardless of pruning.
A subsequent commit that adds pruning-aware locking will update the
expected output for the pruning-enabled case, showing that only the
surviving partition is locked.
---
src/test/regress/expected/partition_prune.out | 83 +++++++++++++++++++
src/test/regress/sql/partition_prune.sql | 55 ++++++++++++
2 files changed, 138 insertions(+)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..39dab8fcc05 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,86 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..229c5eb370c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,58 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
--
2.47.3
[application/octet-stream] v7-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 3-v7-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From c67ec5cc6bbe20d7ad14fb99cd1696939c6ec70f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v7 5/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 65dfae58dcf..c70e06d8886 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -764,6 +778,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,6 +1377,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1370,7 +1394,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1461,6 +1485,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v7-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (27.6K, 4-v7-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From aeaaa5059a7be06c301b1372c16829225b2770fb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v7 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep() builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present, and returns it
directly to the caller.
ExecutorStart() now checks if QueryDesc->estate is already set
(indicating ExecutorPrep() was called earlier). If so, it reuses
the EState to avoid redoing range table setup and pruning.
Otherwise, it invokes ExecutorPrep() itself and adopts the
resulting EState for the duration of the query. This keeps the
executor startup behavior unchanged while making the setup work
callable separately when needed.
CreateQueryDesc() grows a prep_estate argument to accept an
optionally pre-created EState and stores it in the QueryDesc.
Portals, SPI, SQL functions, and EXPLAIN are wired to carry
optional EState pointers alongside the PlannedStmt list, but most
callers still pass NULL and let ExecutorStart() perform the setup
lazily.
ExecutorPrep() requires the caller to have established an active
snapshot, as partition pruning expressions may call PL functions
that internally require one (e.g., via EnsurePortalSnapshotExists()).
Update executor/README and related comments to document the new
control flow and the separation between preparation and execution.
Note that as of this commit, ExecutorStart() is the only caller of
ExecutorPrep(), so there is no semantic change in behavior. Later
commits will add specialized callers that invoke ExecutorPrep()
earlier to enable pruning-aware locking in cached plans.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 176 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 ++++
src/include/nodes/execnodes.h | 1 -
src/include/utils/portal.h | 2 +
20 files changed, 241 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..ef1ee2568c6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -875,7 +875,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 93918a223b8..40564d4dff9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -550,7 +551,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 963618a64c4..ff759ddd07c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1173,7 +1173,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 5b86a727587..005fbb48aa5 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,7 +576,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -650,14 +653,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 654f9246ad0..d7e99690c7f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,6 +55,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -145,7 +146,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -171,9 +171,71 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +325,84 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an EState that can be reused later.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -838,38 +978,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 4ca342a43ef..c93e2664cfd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1368,7 +1368,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3019a3b2b97..994a69a1c8e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2499,6 +2500,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2577,6 +2580,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2614,9 +2618,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2694,7 +2700,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d01a09dd0c4..cd1e429ceed 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1230,6 +1230,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2029,6 +2030,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index c1a53e658cb..941e95010c3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -297,6 +298,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 86226f8db70..3756a11345f 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..e6fa122e6e4 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc. Safe when
+ * prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..84d80e3ab0d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -775,7 +775,6 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
-
/*
* ExecRowMark -
* runtime representation of FOR [KEY] UPDATE/SHARE clauses
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v7-0004-Use-pruning-aware-locking-in-cached-plans.patch (36.1K, 5-v7-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From e0130ef11bfb97dba5afce22370cba5f3741ab0a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v7 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 26 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 20 ++
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 255 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 29 +-
src/test/regress/expected/partition_prune.out | 50 +++-
src/test/regress/expected/plancache.out | 62 +++++
src/test/regress/sql/partition_prune.sql | 24 +-
src/test/regress/sql/plancache.sql | 51 ++++
15 files changed, 536 insertions(+), 29 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 005fbb48aa5..e8cd47131ce 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c93e2664cfd..65dfae58dcf 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..a7a4baaf8af 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4858,8 +4858,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4873,6 +4873,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 994a69a1c8e..13703969dd8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2502,6 +2512,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2576,11 +2587,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..ddb7902bc89 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. There is one ModifyTable
+ * per query level, so this captures exactly one entry per level.
+ * ExecInitModifyTable() asserts that the recorded index matches
+ * what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cd1e429ceed..5c145a31274 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1636,6 +1636,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2017,7 +2018,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2030,7 +2035,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 812e2265734..1d3244307da 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +991,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1016,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1302,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the EStates in
+ * cprep->prep_estates. These are intended to be passed later to
+ * ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1317,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1340,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1929,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1982,214 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - cleans up each EState
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where CheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+ CurrentResourceOwner = oldowner;
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c175ee95b68..989b3c73691 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8c9321aab8c..1431f12a6e8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 984c51515c6..da3ce9f3177 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,32 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +266,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 39dab8fcc05..39770f3b6d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4860,9 +4860,7 @@ select c.relname
relname
--------------
prunelock_p1
- prunelock_p2
- prunelock_p3
-(3 rows)
+(1 row)
commit;
deallocate prunelock_q;
@@ -4904,6 +4902,50 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 229c5eb370c..87672ad40f7 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1499,6 +1499,28 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v7-0006-Reuse-partition-pruning-results-in-parallel-worke.patch (8.2K, 6-v7-0006-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From 9c94b3751ae0c9decc337e33de2750a954a88d6f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v7 6/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute. This
check helps catch inconsistencies across leader and worker pruning
logic.
---
src/backend/executor/execParallel.c | 108 +++++++++++++++++++++++++++-
1 file changed, 107 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..d337bf8c081 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
--
2.47.3
[application/octet-stream] v7-0001-Refactor-partition-pruning-initialization-for-cla.patch (10.2K, 7-v7-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 6f2c9cc7a30d38cb2606595f62b62c77e2aba6e9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 15:08:52 +0900
Subject: [PATCH v7 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called at a time when the PARAM_EXEC
parameters have not yet been set up.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 103 +++++++++++++++++++--------
src/include/executor/execPartition.h | 1 +
3 files changed, 74 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..654f9246ad0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -868,6 +868,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bab294f5e91..20c3513fabe 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,8 +184,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1942,6 +1941,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1966,6 +1968,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1973,11 +1998,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1995,20 +2020,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -2016,8 +2034,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2135,14 +2151,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2376,8 +2390,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2389,10 +2403,29 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
+ }
+#endif
+ }
j++;
}
@@ -2489,9 +2522,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2519,9 +2553,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-03-19 17:20 ` Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-03-19 17:20 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> On Sat, Mar 7, 2026 at 6:54 PM Amit Langote <[email protected]> wrote:
> > Attached is v6 of the patch series. I've been working toward
> > committing this, so I wanted to lay out the ExecutorPrep() design and
> > the key trade-offs before doing so.
> >
> > When a cached generic plan references a partitioned table,
> > GetCachedPlan() locks all partitions upfront via
> > AcquireExecutorLocks(), even those that initial pruning will
> > eliminate. But initial partition pruning only runs later during
> > ExecutorStart(). Moving pruning earlier requires some executor setup
> > (range table, permissions, pruning state), and ExecutorPrep() is the
> > vehicle for that. Unlike the approach reverted in last May, this
> > keeps the CachedPlan itself unchanged -- all per-execution state flows
> > through a separate CachedPlanPrepData that the caller provides.
> >
> > The approach also keeps GetCachedPlan()'s interface
> > backward-compatible: the new CachedPlanPrepData argument is optional.
> > If a caller passes NULL, all partitions are locked as before and
> > nothing changes. This means existing callers and any new code that
> > calls GetCachedPlan() without caring about pruning-aware locking just
> > works.
> >
> > The risk is on the other side: if a caller does pass a
> > CachedPlanPrepData, GetCachedPlan() will lock only the surviving
> > partitions and populate prep_estates with the EStates that
> > ExecutorPrep() created. The caller then must make those EStates
> > available to ExecutorStart() -- via QueryDesc->estate,
> > portal->prep_estates, or the equivalent path for SPI and SQL
> > functions. If it fails to do so, ExecutorStart() will call
> > ExecutorPrep() again, which may compute different pruning results than
> > the original call, potentially expecting locks on relations that were
> > never acquired. The executor would then operate on relations it
> > doesn't hold locks on.
> >
> > So the contract is: if you opt in to pruning-aware locking by passing
> > CachedPlanPrepData, you must complete the pipeline by delivering the
> > prep EStates to the executor. In the current patch, all the call sites
> > that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
> > EXPLAIN) do thread the EStates through correctly, and I've tried to
> > make the plumbing straightforward enough that it's hard to get wrong.
> > But it is a new invariant that didn't exist before, and a caller that
> > gets it wrong would fail silently rather than with an obvious error.
> >
> > To catch such violations, I've added a debug-only check in
> > standard_ExecutorStart() that fires when no prep EState was provided.
> > It iterates over the plan's rtable and verifies that every lockable
> > relation is actually locked. It should always be true if
> > AcquireExecutorLocks() locked everything, but would fail if
> > pruning-aware locking happened upstream and the caller dropped the
> > prep EState. The check is skipped in parallel workers, which acquire
> > relation locks lazily in ExecGetRangeTableRelation().
> >
> > + if (queryDesc->estate == NULL)
> > + {
> > +#ifdef USE_ASSERT_CHECKING
> > + if (!IsParallelWorker())
> > + {
> > + ListCell *lc;
> > +
> > + foreach(lc, queryDesc->plannedstmt->rtable)
> > + {
> > + RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
> > +
> > + if (rte->rtekind == RTE_RELATION ||
> > + (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
> > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > + rte->rellockmode,
> > + true));
> > + }
> > + }
> > +#endif
> > + queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
> > + queryDesc->params,
> > + CurrentResourceOwner,
> > + true,
> > + eflags);
> > + }
> > +#ifdef USE_ASSERT_CHECKING
> > + else
> > + {
> > + /*
> > + * A prep EState was provided, meaning pruning-aware locking
> > + * should have locked at least the unpruned relations.
> > + */
> > + if (!IsParallelWorker())
> > + {
> > + int rtindex = -1;
> > +
> > + while ((rtindex =
> > bms_next_member(queryDesc->estate->es_unpruned_relids,
> > + rtindex)) >= 0)
> > + {
> > + RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
> > +
> > + Assert(rte->rtekind == RTE_RELATION ||
> > + (rte->rtekind == RTE_SUBQUERY &&
> > + rte->relid != InvalidOid));
> > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > + rte->rellockmode, true));
> > + }
> > + }
> > + }
> > +#endif
> >
> > So the invariant is: if no prep EState was provided, every relation in
> > the plan is locked; if one was provided, at least the unpruned
> > relations are locked. Both are checked in assert builds.
> >
> > I think this covers the main concerns, but I may be missing something.
> > If anyone sees a problem with this approach, I'd like to hear about
> > it.
>
> Here's v7. Some plancache.c changes that I'd made were in the wrong
> patch in v6; this version puts them where they belong.
Attached is an updated set. One more fix: I added an Assert in
SPI_cursor_open_internal()'s !plan->saved path to verify that
prep_estates is NIL. Unsaved plans always take the custom plan path,
so pruning-aware locking never applies, but it's worth guarding
explicitly since the copyObject/ReleaseCachedPlan sequence that
follows would not be safe otherwise. Also changed
SPI_plan_get_cached_plan() to pass NULL for cprep, since it only
returns the CachedPlan pointer and has no way to deliver prep_estates
to anyone.
Stepping back -- the core question is whether running executor logic
(pruning) inside GetCachedPlan() is acceptable at all. The plan cache
and executor have always had a clean boundary: plan cache locks
everything, executor runs. This optimization necessarily crosses that
line, because the information needed to decide which locks to skip
(pruning results) can only come from executor machinery.
The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
limited subset of executor work (range table init, permissions,
pruning), carry the results out through CachedPlanPrepData, and leave
the CachedPlan itself untouched. The executor already has a multi-step
protocol: start/run/end. prep/start/run/end is just a finer
decomposition of what InitPlan() was already doing inside
ExecutorStart().
Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
function support) and 0005 (parallel worker reuse) are useful
follow-ons but not essential. The optimization works without them for
most cases, and they can be reviewed and committed separately.
If there's a cleaner way to avoid locking pruned partitions without
the plumbing this patch adds, I haven't found it in the year since the
revert. I'd welcome a pointer if you see one. Failing that, I think
this is the right trade-off, but it's a judgment call about where to
hold your nose.
Tom, I'd value your opinion on whether this approach is something
you'd be comfortable seeing in the tree.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v8-0005-Reuse-partition-pruning-results-in-parallel-worke.patch (11.0K, 2-v8-0005-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From 4c12c380b75b8684e9c41c80d0c77027cf592e17 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Mar 2026 20:03:58 +0900
Subject: [PATCH v8 5/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute. This
check helps catch inconsistencies across leader and worker pruning
logic.
---
src/backend/executor/execMain.c | 10 +--
src/backend/executor/execParallel.c | 108 +++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 2 +-
src/include/executor/executor.h | 3 +-
4 files changed, 116 insertions(+), 7 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0f95ad88497..9a3700e672f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -207,7 +207,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
queryDesc->params,
CurrentResourceOwner,
- eflags);
+ eflags, true);
}
#ifdef USE_ASSERT_CHECKING
else
@@ -330,7 +330,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
* ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
*
* Performs range table initialization, permission checks, and initial
- * partition pruning if partPruneInfos are present.
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true; false in a parallel worker.
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
@@ -340,7 +341,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
- int eflags)
+ int eflags, bool do_initial_pruning)
{
ResourceOwner oldowner;
EState *estate;
@@ -386,7 +387,8 @@ ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
* to es_part_prune_infos.
*/
ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
CurrentResourceOwner = oldowner;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..2de4b35a16e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, 0, false);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 2d4c57d3deb..0dd4f40c964 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -2102,7 +2102,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
}
prep_estate = ExecutorPrep(plannedstmt, cprep->params,
- cprep->owner, cprep->eflags);
+ cprep->owner, cprep->eflags, true);
Assert(prep_estate);
cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 24604120c27..38848ba0651 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -240,7 +240,8 @@ extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern EState *ExecutorPrep(PlannedStmt *pstmt,
ParamListInfo params,
ResourceOwner owner,
- int eflags);
+ int eflags,
+ bool do_initial_pruning);
/*
* Walk a prep_estates list in step with a parallel stmt_list iteration.
--
2.47.3
[application/octet-stream] v8-0003-Use-pruning-aware-locking-in-cached-plans.patch (41.1K, 3-v8-0003-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 2e637cbc71a14775e161bde21e1036eca2644a2b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Mar 2026 19:02:04 +0900
Subject: [PATCH v8 3/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this was
an optimization (avoid processing pruned partitions); it is now a
correctness requirement, because pruned partitions may not be locked.
ExecGetRangeTableRelation() already enforces this with an error when
called on a pruned relation.
---
src/backend/commands/prepare.c | 17 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 22 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 20 ++
src/backend/tcop/postgres.c | 7 +-
src/backend/utils/cache/plancache.c | 257 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 35 ++-
src/test/regress/expected/partition_prune.out | 145 ++++++++++
src/test/regress/expected/plancache.out | 62 +++++
src/test/regress/sql/partition_prune.sql | 77 ++++++
src/test/regress/sql/plancache.sql | 51 ++++
15 files changed, 689 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c7bab14b633..fec83cc6fd4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -156,6 +156,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -195,7 +196,9 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -207,7 +210,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -577,6 +580,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -635,8 +639,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -655,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 952a784c924..c0ca72b38dd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -699,6 +699,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..9230f2b554f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4865,8 +4865,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4880,6 +4880,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 380bbc44e97..f1d84f7a350 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1580,6 +1580,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1660,7 +1661,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1670,7 +1674,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
* so must copy the plan into the portal's context. An error here
* will result in leaking our refcount on the plan, but it doesn't
* matter because the plan is unsaved and hence transient anyway.
+ *
+ * Unsaved plans use custom plans, so prep should be a no-op.
*/
+ Assert(cprep.prep_estates == NIL);
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
MemoryContextSwitchTo(oldcontext);
@@ -1686,7 +1693,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -2104,7 +2111,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2503,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2577,11 +2586,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..ddb7902bc89 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. There is one ModifyTable
+ * per query level, so this captures exactly one entry per level.
+ * ExecInitModifyTable() asserts that the recorded index matches
+ * what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 355a490cde9..de362ff1672 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1637,6 +1637,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2018,7 +2019,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2031,7 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 182c16e9b9a..2d4c57d3deb 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +991,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1016,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1302,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function
+ * performs initial pruning via ExecutorPrep() and locks only the
+ * surviving partitions. The resulting EStates are stored in
+ * cprep->prep_estates and must be delivered to ExecutorStart() via
+ * QueryDesc->estate (or the equivalent portal/SPI path). Failure
+ * to do so means the executor will operate on relations for which
+ * locks were never acquired. Passing NULL for cprep is always safe;
+ * all partitions are locked as before.
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1321,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1344,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1933,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1986,212 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params,
+ cprep->owner, cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where CheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+ CurrentResourceOwner = oldowner;
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27758ec16fe..4fd9d9bcc56 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..55279cbbda8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 984c51515c6..c22f832d0b1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,38 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ *
+ * prep_estates must reach ExecutorStart() to be adopted for execution.
+ * If the plan is invalidated before that happens, CachedPlanPrepCleanup()
+ * frees them instead. The EStates are allocated in 'context' and their
+ * resources tracked under 'owner', which the caller sets to match the
+ * execution environment (e.g., portal context and resowner).
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +272,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..8e0cc98baca 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,148 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..804dd3c8f4e 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,80 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v8-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 4-v8-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 2ab5fefb9644118a1f1528a53b9a6af90e063edb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v8 4/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c0ca72b38dd..f246f051c25 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -73,6 +73,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -658,6 +659,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -696,11 +699,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -721,9 +733,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -765,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1363,6 +1378,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1371,7 +1395,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1462,6 +1486,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v8-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (27.1K, 5-v8-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From a2a0befc44d25df8b549644a7e179923270a0fc6 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v8 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep() builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present, and returns it
directly to the caller.
ExecutorStart() now checks if QueryDesc->estate is already set
(indicating ExecutorPrep() was called earlier). If so, it reuses
the EState to avoid redoing range table setup and pruning.
Otherwise, it invokes ExecutorPrep() itself and adopts the
resulting EState for the duration of the query. This keeps the
executor startup behavior unchanged while making the setup work
callable separately when needed.
CreateQueryDesc() grows a prep_estate argument to accept an
optionally pre-created EState and stores it in the QueryDesc.
Portals, SPI, SQL functions, and EXPLAIN are wired to carry
optional EState pointers alongside the PlannedStmt list, but most
callers still pass NULL and let ExecutorStart() perform the setup
lazily.
ExecutorPrep() requires the caller to have established an active
snapshot, as partition pruning expressions may call PL functions
that internally require one (e.g., via EnsurePortalSnapshotExists()).
Update executor/README and related comments to document the new
control flow and the separation between preparation and execution.
Note that as of this commit, ExecutorStart() is the only caller of
ExecutorPrep(), so there is no semantic change in behavior. Later
commits will add specialized callers that invoke ExecutorPrep()
earlier to enable pruning-aware locking in cached plans.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 164 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 +++++
src/include/nodes/execnodes.h | 1 -
src/include/utils/portal.h | 2 +
20 files changed, 229 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..e09303491d2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -877,7 +877,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 296ea8a1ed2..02027c429e1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -372,7 +372,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -494,7 +494,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -552,7 +553,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b98801d08f2..939e7a632f0 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1174,7 +1174,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..c7bab14b633 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -577,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -652,14 +655,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c58a2abe9a7..0f95ad88497 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -147,7 +148,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +173,70 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +326,73 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present.
+ *
+ * Returns an EState that the caller must either pass to ExecutorStart()
+ * for reuse or free via FreeExecutorState() if execution will not proceed.
+ * GetCachedPlan() uses this to determine which partitions to lock after
+ * pruning; if the resulting EState is not delivered to ExecutorStart(),
+ * the executor would operate on unlocked relations.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * Also perform initial pruning to compute the subset of child subplans
+ * that will be executed. The results, which are bitmapsets of selected
+ * child indexes, are saved in es_part_prune_results. This list is parallel
+ * to es_part_prune_infos.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,38 +968,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..952a784c924 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1369,7 +1369,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..380bbc44e97 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,9 +2619,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2695,7 +2701,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b3563113219..355a490cde9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..443b583637c 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -286,6 +286,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..71ebe38bc86 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -64,7 +64,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 064df01811e..24604120c27 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc.
+ *
+ * Safe when prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0716c5a9aed..42d75693d43 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -784,7 +784,6 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
-
/*
* ExecRowMark -
* runtime representation of FOR [KEY] UPDATE/SHARE clauses
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v8-0001-Refactor-partition-pruning-initialization-for-cla.patch (10.2K, 6-v8-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From a79af61882f1ff696d46f612a5b3a8ce50ee75d6 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 15:08:52 +0900
Subject: [PATCH v8 1/5] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called at a time when the PARAM_EXEC
parameters have not yet been set up.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 103 +++++++++++++++++++--------
src/include/executor/execPartition.h | 1 +
3 files changed, 74 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..c58a2abe9a7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -870,6 +870,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..feea9fdfde0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1943,6 +1942,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1967,6 +1969,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1974,11 +1999,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,20 +2021,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -2017,8 +2035,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2152,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2391,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,10 +2404,29 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
+ }
+#endif
+ }
j++;
}
@@ -2490,9 +2523,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2554,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-03-25 07:39 ` Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-03-25 07:39 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Fri, Mar 20, 2026 at 2:20 AM Amit Langote <[email protected]> wrote:
> On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> > On Sat, Mar 7, 2026 at 6:54 PM Amit Langote <[email protected]> wrote:
> > > Attached is v6 of the patch series. I've been working toward
> > > committing this, so I wanted to lay out the ExecutorPrep() design and
> > > the key trade-offs before doing so.
> > >
> > > When a cached generic plan references a partitioned table,
> > > GetCachedPlan() locks all partitions upfront via
> > > AcquireExecutorLocks(), even those that initial pruning will
> > > eliminate. But initial partition pruning only runs later during
> > > ExecutorStart(). Moving pruning earlier requires some executor setup
> > > (range table, permissions, pruning state), and ExecutorPrep() is the
> > > vehicle for that. Unlike the approach reverted in last May, this
> > > keeps the CachedPlan itself unchanged -- all per-execution state flows
> > > through a separate CachedPlanPrepData that the caller provides.
> > >
> > > The approach also keeps GetCachedPlan()'s interface
> > > backward-compatible: the new CachedPlanPrepData argument is optional.
> > > If a caller passes NULL, all partitions are locked as before and
> > > nothing changes. This means existing callers and any new code that
> > > calls GetCachedPlan() without caring about pruning-aware locking just
> > > works.
> > >
> > > The risk is on the other side: if a caller does pass a
> > > CachedPlanPrepData, GetCachedPlan() will lock only the surviving
> > > partitions and populate prep_estates with the EStates that
> > > ExecutorPrep() created. The caller then must make those EStates
> > > available to ExecutorStart() -- via QueryDesc->estate,
> > > portal->prep_estates, or the equivalent path for SPI and SQL
> > > functions. If it fails to do so, ExecutorStart() will call
> > > ExecutorPrep() again, which may compute different pruning results than
> > > the original call, potentially expecting locks on relations that were
> > > never acquired. The executor would then operate on relations it
> > > doesn't hold locks on.
> > >
> > > So the contract is: if you opt in to pruning-aware locking by passing
> > > CachedPlanPrepData, you must complete the pipeline by delivering the
> > > prep EStates to the executor. In the current patch, all the call sites
> > > that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
> > > EXPLAIN) do thread the EStates through correctly, and I've tried to
> > > make the plumbing straightforward enough that it's hard to get wrong.
> > > But it is a new invariant that didn't exist before, and a caller that
> > > gets it wrong would fail silently rather than with an obvious error.
> > >
> > > To catch such violations, I've added a debug-only check in
> > > standard_ExecutorStart() that fires when no prep EState was provided.
> > > It iterates over the plan's rtable and verifies that every lockable
> > > relation is actually locked. It should always be true if
> > > AcquireExecutorLocks() locked everything, but would fail if
> > > pruning-aware locking happened upstream and the caller dropped the
> > > prep EState. The check is skipped in parallel workers, which acquire
> > > relation locks lazily in ExecGetRangeTableRelation().
> > >
> > > + if (queryDesc->estate == NULL)
> > > + {
> > > +#ifdef USE_ASSERT_CHECKING
> > > + if (!IsParallelWorker())
> > > + {
> > > + ListCell *lc;
> > > +
> > > + foreach(lc, queryDesc->plannedstmt->rtable)
> > > + {
> > > + RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
> > > +
> > > + if (rte->rtekind == RTE_RELATION ||
> > > + (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
> > > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > > + rte->rellockmode,
> > > + true));
> > > + }
> > > + }
> > > +#endif
> > > + queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
> > > + queryDesc->params,
> > > + CurrentResourceOwner,
> > > + true,
> > > + eflags);
> > > + }
> > > +#ifdef USE_ASSERT_CHECKING
> > > + else
> > > + {
> > > + /*
> > > + * A prep EState was provided, meaning pruning-aware locking
> > > + * should have locked at least the unpruned relations.
> > > + */
> > > + if (!IsParallelWorker())
> > > + {
> > > + int rtindex = -1;
> > > +
> > > + while ((rtindex =
> > > bms_next_member(queryDesc->estate->es_unpruned_relids,
> > > + rtindex)) >= 0)
> > > + {
> > > + RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
> > > +
> > > + Assert(rte->rtekind == RTE_RELATION ||
> > > + (rte->rtekind == RTE_SUBQUERY &&
> > > + rte->relid != InvalidOid));
> > > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > > + rte->rellockmode, true));
> > > + }
> > > + }
> > > + }
> > > +#endif
> > >
> > > So the invariant is: if no prep EState was provided, every relation in
> > > the plan is locked; if one was provided, at least the unpruned
> > > relations are locked. Both are checked in assert builds.
> > >
> > > I think this covers the main concerns, but I may be missing something.
> > > If anyone sees a problem with this approach, I'd like to hear about
> > > it.
> >
> > Here's v7. Some plancache.c changes that I'd made were in the wrong
> > patch in v6; this version puts them where they belong.
>
> Attached is an updated set. One more fix: I added an Assert in
> SPI_cursor_open_internal()'s !plan->saved path to verify that
> prep_estates is NIL. Unsaved plans always take the custom plan path,
> so pruning-aware locking never applies, but it's worth guarding
> explicitly since the copyObject/ReleaseCachedPlan sequence that
> follows would not be safe otherwise. Also changed
> SPI_plan_get_cached_plan() to pass NULL for cprep, since it only
> returns the CachedPlan pointer and has no way to deliver prep_estates
> to anyone.
>
> Stepping back -- the core question is whether running executor logic
> (pruning) inside GetCachedPlan() is acceptable at all. The plan cache
> and executor have always had a clean boundary: plan cache locks
> everything, executor runs. This optimization necessarily crosses that
> line, because the information needed to decide which locks to skip
> (pruning results) can only come from executor machinery.
>
> The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
> limited subset of executor work (range table init, permissions,
> pruning), carry the results out through CachedPlanPrepData, and leave
> the CachedPlan itself untouched. The executor already has a multi-step
> protocol: start/run/end. prep/start/run/end is just a finer
> decomposition of what InitPlan() was already doing inside
> ExecutorStart().
>
> Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
> function support) and 0005 (parallel worker reuse) are useful
> follow-ons but not essential. The optimization works without them for
> most cases, and they can be reviewed and committed separately.
>
> If there's a cleaner way to avoid locking pruned partitions without
> the plumbing this patch adds, I haven't found it in the year since the
> revert. I'd welcome a pointer if you see one. Failing that, I think
> this is the right trade-off, but it's a judgment call about where to
> hold your nose.
>
> Tom, I'd value your opinion on whether this approach is something
> you'd be comfortable seeing in the tree.
Attached is an updated set with some cleanup after another pass.
- Removed ExecCreatePartitionPruneStates() from 0001. In 0001-0003,
ExecDoInitialPruning() handles both setup and pruning internally; the
split isn't needed yet.
- Tightened commit messages to describe what each commit does now, not
what later commits will use it for. In particular, 0002 is upfront
that the portal/SPI/EXPLAIN plumbing is scaffolding that 0003 lights
up.
- Updated setrefs.c comment for firstResultRels to drop a blanket
claim about one ModifyTable per query level.
As before, 0001-0003 is the focus, maybe 0004 which teaches the new
GetCachedPlan() pruning-aware contract to its relatively new user in
function.c.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v9-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 2-v9-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 3aedeffabed40d317f1f7e2bb80bce8063429795 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v9 4/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c0ca72b38dd..f246f051c25 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -73,6 +73,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -658,6 +659,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -696,11 +699,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -721,9 +733,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -765,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1363,6 +1378,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1371,7 +1395,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1462,6 +1486,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v9-0005-Reuse-partition-pruning-results-in-parallel-worke.patch (15.8K, 3-v9-0005-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From ddcbd693f9aa8498c06b4f20fe4df20ff98974c5 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:57 +0900
Subject: [PATCH v9 5/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Factor the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. Parallel workers need to set up pruning state without
performing initial pruning, since they receive the leader's results
instead.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute.
This check helps catch inconsistencies across leader and worker
pruning logic.
---
src/backend/executor/execMain.c | 25 +++++--
src/backend/executor/execParallel.c | 108 ++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 44 ++++++++---
src/backend/utils/cache/plancache.c | 2 +-
src/include/executor/execPartition.h | 1 +
src/include/executor/executor.h | 3 +-
6 files changed, 161 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 336bd4d09b3..5fa312436fb 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -207,7 +207,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
queryDesc->params,
CurrentResourceOwner,
- eflags);
+ eflags, true);
}
#ifdef USE_ASSERT_CHECKING
else
@@ -330,7 +330,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
* ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
*
* Performs range table initialization, permission checks, and initial
- * partition pruning if partPruneInfos are present.
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true; false in a parallel worker.
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
@@ -341,7 +342,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
- int eflags)
+ int eflags, bool do_initial_pruning)
{
ResourceOwner oldowner;
EState *estate;
@@ -377,14 +378,22 @@ ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
CurrentResourceOwner = owner;
/*
- * Set up PartitionPruneState structures and perform initial partition
- * pruning to compute the subset of child subplans that will be
- * executed. The results, which are bitmapsets of selected child
- * indexes, are saved in es_part_prune_results, parallel to
+ * Set up PartitionPruneState structures needed for initial
+ * partition pruning.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to
+ * compute the subset of child subplans that will be executed.
+ * The results, which are bitmapsets of selected child indexes,
+ * are saved in es_part_prune_results, parallel to
* es_part_prune_infos. RT indexes of surviving partitions are
* added to es_unpruned_relids.
+ *
+ * Parallel workers pass false here and instead receive the
+ * leader's pruning results via shared memory.
*/
- ExecDoInitialPruning(estate);
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
CurrentResourceOwner = oldowner;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..2de4b35a16e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, 0, false);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 2a3af006f77..47322614aad 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1942,6 +1942,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1967,15 +1970,40 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ *
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
* assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
@@ -1996,18 +2024,12 @@ ExecDoInitialPruning(EState *estate)
ListCell *lc;
Assert(estate->es_part_prune_results == NULL);
- foreach(lc, estate->es_part_prune_infos)
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment. RT indexes
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index bb62c648899..879b2d012a1 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -2102,7 +2102,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
}
prep_estate = ExecutorPrep(plannedstmt, cprep->params,
- cprep->owner, cprep->eflags);
+ cprep->owner, cprep->eflags, true);
Assert(prep_estate);
cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 4505ceaca3c..8e5fde965ed 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -240,7 +240,8 @@ extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern EState *ExecutorPrep(PlannedStmt *pstmt,
ParamListInfo params,
ResourceOwner owner,
- int eflags);
+ int eflags,
+ bool do_initial_pruning);
/*
* Walk a prep_estates list in step with a parallel stmt_list iteration.
--
2.47.3
[application/octet-stream] v9-0003-Use-pruning-aware-locking-in-cached-plans.patch (41.8K, 4-v9-0003-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From a5cbee90d2f57c0b775ecc9d959bdcf9fe864075 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Mar 2026 19:02:04 +0900
Subject: [PATCH v9 3/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this was
an optimization (avoid processing pruned partitions); it is now a
correctness requirement, because pruned partitions may not be locked.
ExecGetRangeTableRelation() already enforces this with an error when
called on a pruned relation.
---
src/backend/commands/prepare.c | 17 +-
src/backend/executor/execMain.c | 4 +
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 22 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/postgres.c | 7 +-
src/backend/utils/cache/plancache.c | 257 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 35 ++-
src/test/regress/expected/partition_prune.out | 145 ++++++++++
src/test/regress/expected/plancache.out | 62 +++++
src/test/regress/sql/partition_prune.sql | 77 ++++++
src/test/regress/sql/plancache.sql | 51 ++++
16 files changed, 691 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c7bab14b633..fec83cc6fd4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -156,6 +156,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -195,7 +196,9 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -207,7 +210,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -577,6 +580,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -635,8 +639,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -655,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 282c9871de0..336bd4d09b3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -334,6 +334,10 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
+ * GetCachedPlan() uses this to determine, based on initial pruning
+ * results, which partitions to lock; if the resulting EState is not
+ * delivered to ExecutorStart(), the executor would operate on unlocked
+ * relations. See the assert checks in standard_ExecutorStart().
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 952a784c924..c0ca72b38dd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -699,6 +699,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..9230f2b554f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4865,8 +4865,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4880,6 +4880,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 380bbc44e97..f1d84f7a350 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1580,6 +1580,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1660,7 +1661,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1670,7 +1674,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
* so must copy the plan into the portal's context. An error here
* will result in leaking our refcount on the plan, but it doesn't
* matter because the plan is unsaved and hence transient anyway.
+ *
+ * Unsaved plans use custom plans, so prep should be a no-op.
*/
+ Assert(cprep.prep_estates == NIL);
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
MemoryContextSwitchTo(oldcontext);
@@ -1686,7 +1693,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -2104,7 +2111,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2503,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2577,11 +2586,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..8c9956e687e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. ExecInitModifyTable() asserts
+ * that the recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 355a490cde9..de362ff1672 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1637,6 +1637,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2018,7 +2019,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2031,7 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..bb62c648899 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +991,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1016,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1302,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function
+ * performs initial pruning via ExecutorPrep() and locks only the
+ * surviving partitions. The resulting EStates are stored in
+ * cprep->prep_estates and must be delivered to ExecutorStart() via
+ * QueryDesc->estate (or the equivalent portal/SPI path). Failure
+ * to do so means the executor will operate on relations for which
+ * locks were never acquired. Passing NULL for cprep is always safe;
+ * all partitions are locked as before.
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1321,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1344,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1933,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1986,212 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params,
+ cprep->owner, cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where CheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+ CurrentResourceOwner = oldowner;
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27758ec16fe..4fd9d9bcc56 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..55279cbbda8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..177150a5848 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,38 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ *
+ * prep_estates must reach ExecutorStart() to be adopted for execution.
+ * If the plan is invalidated before that happens, CachedPlanPrepCleanup()
+ * frees them instead. The EStates are allocated in 'context' and their
+ * resources tracked under 'owner', which the caller sets to match the
+ * execution environment (e.g., portal context and resowner).
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +272,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..8e0cc98baca 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,148 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..804dd3c8f4e 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,80 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v9-0001-Refactor-executor-s-initial-partition-pruning-set.patch (7.3K, 5-v9-0001-Refactor-executor-s-initial-partition-pruning-set.patch)
download | inline diff:
From 6b2a9740b49a5238569cfeeb11fa632225ec2cfb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v9 1/5] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v9-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (25.5K, 6-v9-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From 32267b58bdf9db56a716abde9fcc3e4e8fac6fee Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:07:18 +0900
Subject: [PATCH v9 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorStart() calls it to build the EState, keeping
behavior unchanged.
If QueryDesc->estate is already set when ExecutorStart() is called,
the existing EState is reused and ExecutorPrep() is skipped. This
allows a later commit to supply a pre-built EState from outside
the executor.
Add scaffolding for carrying an optional prep EState through
CreateQueryDesc, PortalDefineQuery, and SPI. All callers currently
pass NULL/NIL; the next commit populates these to enable
pruning-aware locking in cached plans.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 157 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 ++++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 +++++
src/include/utils/portal.h | 2 +
19 files changed, 223 insertions(+), 50 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..b9bd5ba7078 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1011,7 +1011,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e4b70166b0e..24c0c235fd3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -372,7 +372,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -494,7 +494,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -552,7 +553,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b98801d08f2..939e7a632f0 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1174,7 +1174,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..c7bab14b633 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -577,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -652,14 +655,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..282c9871de0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -147,7 +148,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +173,70 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +326,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present.
+ *
+ * Returns an EState that the caller must either pass to ExecutorStart()
+ * for reuse or free via FreeExecutorState() if execution will not proceed.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures and perform initial partition
+ * pruning to compute the subset of child subplans that will be
+ * executed. The results, which are bitmapsets of selected child
+ * indexes, are saved in es_part_prune_results, parallel to
+ * es_part_prune_infos. RT indexes of surviving partitions are
+ * added to es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,37 +962,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..952a784c924 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1369,7 +1369,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..380bbc44e97 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,9 +2619,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2695,7 +2701,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b3563113219..355a490cde9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..443b583637c 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -286,6 +286,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..71ebe38bc86 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -64,7 +64,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..4505ceaca3c 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc.
+ *
+ * Safe when prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-03-26 09:24 ` Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-03-26 09:24 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, Mar 25, 2026 at 4:39 PM Amit Langote <[email protected]> wrote:
> On Fri, Mar 20, 2026 at 2:20 AM Amit Langote <[email protected]> wrote:
> > On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> > Stepping back -- the core question is whether running executor logic
> > (pruning) inside GetCachedPlan() is acceptable at all. The plan cache
> > and executor have always had a clean boundary: plan cache locks
> > everything, executor runs. This optimization necessarily crosses that
> > line, because the information needed to decide which locks to skip
> > (pruning results) can only come from executor machinery.
> >
> > The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
> > limited subset of executor work (range table init, permissions,
> > pruning), carry the results out through CachedPlanPrepData, and leave
> > the CachedPlan itself untouched. The executor already has a multi-step
> > protocol: start/run/end. prep/start/run/end is just a finer
> > decomposition of what InitPlan() was already doing inside
> > ExecutorStart().
> >
> > Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
> > function support) and 0005 (parallel worker reuse) are useful
> > follow-ons but not essential. The optimization works without them for
> > most cases, and they can be reviewed and committed separately.
> >
> > If there's a cleaner way to avoid locking pruned partitions without
> > the plumbing this patch adds, I haven't found it in the year since the
> > revert. I'd welcome a pointer if you see one. Failing that, I think
> > this is the right trade-off, but it's a judgment call about where to
> > hold your nose.
> >
> > Tom, I'd value your opinion on whether this approach is something
> > you'd be comfortable seeing in the tree.
>
> Attached is an updated set with some cleanup after another pass.
>
> - Removed ExecCreatePartitionPruneStates() from 0001. In 0001-0003,
> ExecDoInitialPruning() handles both setup and pruning internally; the
> split isn't needed yet.
>
> - Tightened commit messages to describe what each commit does now, not
> what later commits will use it for. In particular, 0002 is upfront
> that the portal/SPI/EXPLAIN plumbing is scaffolding that 0003 lights
> up.
>
> - Updated setrefs.c comment for firstResultRels to drop a blanket
> claim about one ModifyTable per query level.
>
> As before, 0001-0003 is the focus, maybe 0004 which teaches the new
> GetCachedPlan() pruning-aware contract to its relatively new user in
> function.c.
While reviewing the patch more carefully, I realized there's a
correctness issue when rule rewriting causes a single statement to
expand into multiple PlannedStmts in one CachedPlan.
PortalRunMulti() executes those statements sequentially, with
CommandCounterIncrement() between them, so Q2's ExecutorStart()
normally sees the effects of Q1.
With the patch, though, AcquireExecutorLocksUnpruned() runs
ExecutorPrep() on all PlannedStmts in one pass during GetCachedPlan(),
before any statement executes. If a later statement has
initial-pruning expressions that read data modified by an earlier one,
pruning can see stale results.
There's also a memory lifetime issue: PortalRunMulti() calls
MemoryContextDeleteChildren(portalContext) between statements, which
destroys EStates prepared for later statements.
Here's a concrete case demonstrating the semantic issue:
create table multistmt_pt (a int, b int) partition by list (a);
create table multistmt_pt_1 partition of multistmt_pt for values in (1);
create table multistmt_pt_2 partition of multistmt_pt for values in (2);
insert into multistmt_pt values (1, 0), (2, 0);
create table prune_config (val int);
insert into prune_config values (1);
create function get_prune_val() returns int as $$
select val from prune_config;
$$ language sql stable;
-- rule action runs first, updating prune_config before the
-- original statement's pruning would normally be evaluated
create rule config_upd_rule as on update to multistmt_pt
do also update prune_config set val = 2;
set plan_cache_mode to force_generic_plan;
prepare multi_q as
update multistmt_pt set b = b + 1 where a = get_prune_val();
execute multi_q; -- creates the generic plan
-- reset for the real test
update prune_config set val = 1;
update multistmt_pt set b = 0;
-- second execute reuses the plan
execute multi_q;
select * from multistmt_pt order by a;
Without the patch: the rule action updates prune_config to val=2
first, then after CCI the original statement's initial pruning calls
get_prune_val(), gets 2, prunes to multistmt_pt_2, and updates it
correctly: (1, 0), (2, 1).
With the patch as it stood: both statements' pruning runs during
GetCachedPlan() before either executes. The original statement's
pruning sees val=1, prunes to multistmt_pt_1, and multistmt_pt_2 is
never touched.
The fix is to skip pruning-aware locking for CachedPlans containing
multiple PlannedStmts, falling back to locking all partitions.
Single-statement plans are unchanged.
Since multi-statement plans are now excluded, CachedPlanPrepData no
longer needs a list of EStates -- it carries a single EState pointer.
This simplifies the plumbing throughout: PortalData,
PortalDefineQuery, SPI, and EXPLAIN all pass a single optional EState
instead of walking parallel lists. The next_prep_estate() helper is
gone.
Attached is the updated set.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v10-0005-Reuse-partition-pruning-results-in-parallel-work.patch (15.8K, 2-v10-0005-Reuse-partition-pruning-results-in-parallel-work.patch)
download | inline diff:
From 33fff6e090d9c713413a68ef2bdf9721f7e7f95b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:57 +0900
Subject: [PATCH v10 5/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Factor the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. Parallel workers need to set up pruning state without
performing initial pruning, since they receive the leader's results
instead.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute.
This check helps catch inconsistencies across leader and worker
pruning logic.
---
src/backend/executor/execMain.c | 25 +++++--
src/backend/executor/execParallel.c | 108 ++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 44 ++++++++---
src/backend/utils/cache/plancache.c | 2 +-
src/include/executor/execPartition.h | 1 +
src/include/executor/executor.h | 3 +-
6 files changed, 161 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 051b5d7bfcf..659557189ce 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -207,7 +207,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
queryDesc->params,
CurrentResourceOwner,
- eflags);
+ eflags, true);
}
#ifdef USE_ASSERT_CHECKING
else
@@ -330,7 +330,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
* ExecutorPrep: build initial executor state for a PlannedStmt.
*
* Performs range table initialization, permission checks, and initial
- * partition pruning if partPruneInfos are present.
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true; false in a parallel worker.
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
@@ -341,7 +342,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
- int eflags)
+ int eflags, bool do_initial_pruning)
{
ResourceOwner oldowner;
EState *estate;
@@ -378,14 +379,22 @@ ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
CurrentResourceOwner = owner;
/*
- * Set up PartitionPruneState structures and perform initial partition
- * pruning to compute the subset of child subplans that will be
- * executed. The results, which are bitmapsets of selected child
- * indexes, are saved in es_part_prune_results, parallel to
+ * Set up PartitionPruneState structures needed for initial
+ * partition pruning.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to
+ * compute the subset of child subplans that will be executed.
+ * The results, which are bitmapsets of selected child indexes,
+ * are saved in es_part_prune_results, parallel to
* es_part_prune_infos. RT indexes of surviving partitions are
* added to es_unpruned_relids.
+ *
+ * Parallel workers pass false here and instead receive the
+ * leader's pruning results via shared memory.
*/
- ExecDoInitialPruning(estate);
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
CurrentResourceOwner = oldowner;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..2de4b35a16e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, 0, false);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 2a3af006f77..47322614aad 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1942,6 +1942,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1967,15 +1970,40 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ *
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
* assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
@@ -1996,18 +2024,12 @@ ExecDoInitialPruning(EState *estate)
ListCell *lc;
Assert(estate->es_part_prune_results == NULL);
- foreach(lc, estate->es_part_prune_infos)
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment. RT indexes
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index b0c4d62564d..6c178c461a7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -2100,7 +2100,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
}
prep_estate = ExecutorPrep(plannedstmt, cprep->params,
- cprep->owner, cprep->eflags);
+ cprep->owner, cprep->eflags, true);
Assert(prep_estate);
cprep->prep_estate = prep_estate;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fac5bef1384..37195312bce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -240,7 +240,8 @@ extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern EState *ExecutorPrep(PlannedStmt *pstmt,
ParamListInfo params,
ResourceOwner owner,
- int eflags);
+ int eflags,
+ bool do_initial_pruning);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
--
2.47.3
[application/octet-stream] v10-0001-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 3-v10-0001-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From 6b2a9740b49a5238569cfeeb11fa632225ec2cfb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v10 1/5] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v10-0002-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (23.5K, 4-v10-0002-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 4e849ce0af12963ee2040f187f4cb0bad1c2851e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v10 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorStart() calls it to build the EState, keeping
behavior unchanged.
If QueryDesc->estate is already set when ExecutorStart() is called,
the existing EState is reused and ExecutorPrep() is skipped. This
allows a later commit to supply a pre-built EState from outside
the executor.
Add scaffolding for carrying an optional prep EState through
CreateQueryDesc, PortalDefineQuery, and SPI. All callers currently
pass NULL; the next commit populates these to enable pruning-aware
locking in cached plans.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 4 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 158 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 4 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 19 +++-
src/backend/utils/mmgr/portalmem.c | 7 ++
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 7 ++
src/include/utils/portal.h | 2 +
19 files changed, 195 insertions(+), 50 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..b9bd5ba7078 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1011,7 +1011,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e4b70166b0e..24c0c235fd3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -372,7 +372,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -494,7 +494,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -552,7 +553,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b98801d08f2..939e7a632f0 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1174,7 +1174,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..c24d97f7e5a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NULL,
cplan);
/*
@@ -659,7 +660,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, NULL,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..cc7794f58db 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -147,7 +148,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +173,70 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +326,68 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: build initial executor state for a PlannedStmt.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present.
+ *
+ * Returns an EState that the caller must either pass to ExecutorStart()
+ * for reuse or free via FreeExecutorState() if execution will not proceed.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Track resources acquired during pruning under the given
+ * ResourceOwner, which may differ from CurrentResourceOwner
+ * when ExecutorPrep() is called outside ExecutorStart().
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures and perform initial partition
+ * pruning to compute the subset of child subplans that will be
+ * executed. The results, which are bitmapsets of selected child
+ * indexes, are saved in es_part_prune_results, parallel to
+ * es_part_prune_infos. RT indexes of surviving partitions are
+ * added to es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,37 +963,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..952a784c924 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1369,7 +1369,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..32c9d987c59 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NULL,
cplan);
/*
@@ -2695,7 +2696,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ NULL);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b3563113219..ccdb6c01071 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NULL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..42ef3e82f82 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,8 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estate);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1265,7 +1274,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, portal->prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1283,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, portal->prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..0ecda763d21 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,11 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If prep_estate is not NULL, it is an EState created by ExecutorPrep()
+ * during GetCachedPlan(). It will be passed to ExecutorStart() to avoid
+ * redoing range table setup and pruning. The portal takes ownership;
+ * the EState must have been allocated in the portal's memory context.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +291,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ EState *prep_estate,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +305,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estate = prep_estate;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..71ebe38bc86 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -64,7 +64,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..fac5bef1384 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,12 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ int eflags);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..a59e96fa11e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ EState *prep_estate; /* EState from ExecutorPrep() if any */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ EState *prep_estate,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v10-0003-Use-pruning-aware-locking-in-cached-plans.patch (47.3K, 5-v10-0003-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 648b9f5c89069692bbb46cf579576be50a9147f2 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 18:15:39 +0900
Subject: [PATCH v10 3/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan()'s lock acquisition to perform initial
partition pruning via ExecutorPrep(), then lock only the surviving
partitions. This avoids unnecessary locking of pruned partitions
when reusing a generic cached plan.
Introduce CachedPlanPrepData to carry the EState created by
ExecutorPrep() through the plan caching layer. The prep_estate
field is populated when GetCachedPlan() prepares a reused
single-statement generic plan. Adjust call sites in SPI,
portals, and EXPLAIN to propagate this to ExecutorStart().
Disable pruning-aware locking for multi-statement CachedPlans, which
arise from rule rewriting. PortalRunMulti() executes such statements
sequentially with CommandCounterIncrement() between them, so later
statements' pruning expressions may see different results depending
on when they are evaluated. Evaluating all statements' pruning
upfront during GetCachedPlan() would produce stale results for later
statements. Additionally, PortalRunMulti() calls
MemoryContextDeleteChildren(portalContext) between statements, which
would destroy EStates prepared for later statements. The fallback
to locking all partitions is safe and sufficient here; multi-statement
plans from rule rewriting are uncommon.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning and use-after-free.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this was
an optimization (avoid processing pruned partitions); it is now a
correctness requirement, because pruned partitions may not be locked.
ExecGetRangeTableRelation() already enforces this with an error when
called on a pruned relation.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/execMain.c | 4 +
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 24 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 1 +
src/backend/utils/cache/plancache.c | 246 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 38 ++-
src/test/regress/expected/partition_prune.out | 184 +++++++++++++
src/test/regress/expected/plancache.out | 63 +++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++
src/test/regress/sql/plancache.sql | 52 ++++
17 files changed, 769 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c24d97f7e5a..621fd30fd5e 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -156,6 +156,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -195,8 +196,11 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
+ Assert(cprep.prep_estate == NULL || list_length(plan_list) == 1);
/*
* DO NOT add any logic that could possibly throw an error between
@@ -207,7 +211,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NULL,
+ cprep.prep_estate,
cplan);
/*
@@ -577,6 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -633,8 +638,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -655,12 +665,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(cprep.prep_estate == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, NULL,
+ ExplainOnePlan(pstmt, cprep.prep_estate,
into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index cc7794f58db..051b5d7bfcf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -334,6 +334,10 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
+ * GetCachedPlan() uses this to determine, based on initial pruning
+ * results, which partitions to lock; if the resulting EState is not
+ * delivered to ExecutorStart(), the executor would operate on unlocked
+ * relations. See the assert checks in standard_ExecutorStart().
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 952a784c924..c0ca72b38dd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -699,6 +699,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..9230f2b554f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4865,8 +4865,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4880,6 +4880,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 32c9d987c59..eb9552f85db 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1580,6 +1580,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1660,8 +1661,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
+ Assert(cprep.prep_estate == NULL || list_length(stmt_list) == 1);
if (!plan->saved)
{
@@ -1670,7 +1675,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
* so must copy the plan into the portal's context. An error here
* will result in leaking our refcount on the plan, but it doesn't
* matter because the plan is unsaved and hence transient anyway.
+ *
+ * Unsaved plans use custom plans, so prep should be a no-op.
*/
+ Assert(cprep.prep_estate == NULL);
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
MemoryContextSwitchTo(oldcontext);
@@ -1686,7 +1694,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NULL,
+ cprep.prep_estate,
cplan);
/*
@@ -2104,7 +2112,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2510,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2575,8 +2585,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
@@ -2616,6 +2629,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ Assert(cprep.prep_estate == NULL || list_length(stmt_list) == 1);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
@@ -2697,7 +2711,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0,
- NULL);
+ cprep.prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..8c9956e687e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. ExecInitModifyTable() asserts
+ * that the recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ccdb6c01071..487258641a5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1637,6 +1637,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2018,7 +2019,10 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
+ Assert(cprep.prep_estate == NULL || list_length(cplan->stmt_list) == 1);
/*
* Now we can define the portal.
@@ -2031,7 +2035,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NULL,
+ cprep.prep_estate,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 42ef3e82f82..b52c4c619ee 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1214,6 +1214,7 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ Assert(portal->prep_estate == NULL || list_length(portal->stmts) == 1);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..b0c4d62564d 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL and the generic plan contains only a single
+ * statement, ExecutorPrep() is applied to that PlannedStmt to compute the set
+ * of partitions that survive initial runtime pruning in order to only lock
+ * them. The EState is saved in cprep.prep_estate, which must be passed to
+ * ExecutorStart() for reuse.
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +958,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +992,19 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Multi-statement CachedPlans (from rule rewriting) must not
+ * use pruning-aware locking, because later statements' pruning
+ * expressions could see stale results if evaluated before
+ * earlier statements have executed.
+ */
+ if (cprep && list_length(plan->stmt_list) > 1)
+ cprep = NULL;
+
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1026,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1312,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a single-statement generic plan is reused,
+ * the function performs initial pruning via ExecutorPrep() and locks only
+ * the surviving partitions. The resulting EState is stored in
+ * cprep->prep_estate and must be delivered to ExecutorStart() via
+ * QueryDesc->estate (or the equivalent portal/SPI path). Failure
+ * to do so means the executor will operate on relations for which
+ * locks were never acquired. Passing NULL for cprep is always safe;
+ * all partitions are locked as before. Multi-statement plans also
+ * fall back to locking all partitions.
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1332,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1355,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1944,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1997,190 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given single-statement PlannedStmt list.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - stores the EState in cprep->prep_estate
+ *
+ * On release, it:
+ * - uses the EState in cprep->prep_estate to determine which
+ * relids to unlock
+ *
+ * Memory allocation for the EState happens in cprep->context.
+ * Locks are acquired using cprep->owner.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ EState *prep_estate;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState created during acquisition to
+ * determine which relids to unlock.
+ */
+ prep_estate = cprep->prep_estate;
+ Assert(!acquire || prep_estate == NULL);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params,
+ cprep->owner, cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estate = prep_estate;
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Dispose of EState built during pruning-aware lock acquisition.
+ *
+ * This is used when CheckCachedPlan() discovers that a CachedPlan has
+ * become invalid after AcquireExecutorLocksUnpruned() has already run.
+ * The execution locks have already been released by that point; this
+ * function frees the EState that the executor will never see.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ EState *prep_estate;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+
+ prep_estate = cprep->prep_estate;
+ Assert(prep_estate);
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ CurrentResourceOwner = oldowner;
+
+ cprep->prep_estate = NULL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27758ec16fe..4fd9d9bcc56 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..55279cbbda8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..1a153b816eb 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -27,6 +27,9 @@
typedef struct Query Query;
typedef struct RawStmt RawStmt;
+/* to avoid including execnodes.h */
+typedef struct EState EState;
+
/* possible values for plan_cache_mode */
typedef enum
{
@@ -196,6 +199,38 @@ typedef struct CachedExpression
dlist_node node; /* link in global list of CachedExpressions */
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for a CachedPlan's PlannedStmt,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estate is populated when GetCachedPlan() prepares a reused
+ * single-statement generic plan. Multi-statement plans (from rule
+ * rewriting) fall back to locking all partitions and leave this NULL.
+ * If the plan is found invalid after locking, the EState is freed
+ * by CachedPlanPrepCleanup() before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ *
+ * prep_estate must reach ExecutorStart() to be adopted for execution.
+ * If the plan is invalidated before that happens, CachedPlanPrepCleanup()
+ * frees it instead. The EState is allocated in 'context' and its
+ * resources tracked under 'owner', which the caller sets to match the
+ * execution environment (e.g., portal context and resowner).
+ */
+typedef struct CachedPlanPrepData
+{
+ EState *prep_estate; /* EState for the PlannedStmt */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +275,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..61781389d2f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,187 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..3043dbfac2d 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..692415a8d9f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,119 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..6a8b8787de6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v10-0004-Make-SQL-function-executor-track-ExecutorPrep-st.patch (7.7K, 6-v10-0004-Make-SQL-function-executor-track-ExecutorPrep-st.patch)
download | inline diff:
From 5769f6ca7c9ffcee1b51d27105c780c5d6102f55 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v10 4/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 27 ++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 107 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c0ca72b38dd..2be816b6a75 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -73,6 +73,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -658,6 +659,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
/*
* Clean up after previous query, if there was one.
@@ -696,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -721,6 +732,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ Assert(cprep.prep_estate == NULL || list_length(fcache->cplan->stmt_list) == 1);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
@@ -765,6 +777,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = cprep.prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1363,6 +1376,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1371,7 +1393,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1462,6 +1484,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 3043dbfac2d..547846b2945 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -460,4 +460,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
deallocate inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 6a8b8787de6..532fa58518b 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -274,4 +274,38 @@ deallocate inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-03-27 09:00 ` Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-03-27 09:00 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, Mar 26, 2026 at 6:24 PM Amit Langote <[email protected]> wrote:
> On Wed, Mar 25, 2026 at 4:39 PM Amit Langote <[email protected]> wrote:
> > On Fri, Mar 20, 2026 at 2:20 AM Amit Langote <[email protected]> wrote:
> > > On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> > > Stepping back -- the core question is whether running executor logic
> > > (pruning) inside GetCachedPlan() is acceptable at all. The plan cache
> > > and executor have always had a clean boundary: plan cache locks
> > > everything, executor runs. This optimization necessarily crosses that
> > > line, because the information needed to decide which locks to skip
> > > (pruning results) can only come from executor machinery.
> > >
> > > The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
> > > limited subset of executor work (range table init, permissions,
> > > pruning), carry the results out through CachedPlanPrepData, and leave
> > > the CachedPlan itself untouched. The executor already has a multi-step
> > > protocol: start/run/end. prep/start/run/end is just a finer
> > > decomposition of what InitPlan() was already doing inside
> > > ExecutorStart().
> > >
> > > Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
> > > function support) and 0005 (parallel worker reuse) are useful
> > > follow-ons but not essential. The optimization works without them for
> > > most cases, and they can be reviewed and committed separately.
> > >
> > > If there's a cleaner way to avoid locking pruned partitions without
> > > the plumbing this patch adds, I haven't found it in the year since the
> > > revert. I'd welcome a pointer if you see one. Failing that, I think
> > > this is the right trade-off, but it's a judgment call about where to
> > > hold your nose.
> > >
> > > Tom, I'd value your opinion on whether this approach is something
> > > you'd be comfortable seeing in the tree.
> >
> > Attached is an updated set with some cleanup after another pass.
> >
> > - Removed ExecCreatePartitionPruneStates() from 0001. In 0001-0003,
> > ExecDoInitialPruning() handles both setup and pruning internally; the
> > split isn't needed yet.
> >
> > - Tightened commit messages to describe what each commit does now, not
> > what later commits will use it for. In particular, 0002 is upfront
> > that the portal/SPI/EXPLAIN plumbing is scaffolding that 0003 lights
> > up.
> >
> > - Updated setrefs.c comment for firstResultRels to drop a blanket
> > claim about one ModifyTable per query level.
> >
> > As before, 0001-0003 is the focus, maybe 0004 which teaches the new
> > GetCachedPlan() pruning-aware contract to its relatively new user in
> > function.c.
>
> While reviewing the patch more carefully, I realized there's a
> correctness issue when rule rewriting causes a single statement to
> expand into multiple PlannedStmts in one CachedPlan.
>
> PortalRunMulti() executes those statements sequentially, with
> CommandCounterIncrement() between them, so Q2's ExecutorStart()
> normally sees the effects of Q1.
>
> With the patch, though, AcquireExecutorLocksUnpruned() runs
> ExecutorPrep() on all PlannedStmts in one pass during GetCachedPlan(),
> before any statement executes. If a later statement has
> initial-pruning expressions that read data modified by an earlier one,
> pruning can see stale results.
>
> There's also a memory lifetime issue: PortalRunMulti() calls
> MemoryContextDeleteChildren(portalContext) between statements, which
> destroys EStates prepared for later statements.
>
> Here's a concrete case demonstrating the semantic issue:
>
> create table multistmt_pt (a int, b int) partition by list (a);
> create table multistmt_pt_1 partition of multistmt_pt for values in (1);
> create table multistmt_pt_2 partition of multistmt_pt for values in (2);
> insert into multistmt_pt values (1, 0), (2, 0);
>
> create table prune_config (val int);
> insert into prune_config values (1);
>
> create function get_prune_val() returns int as $$
> select val from prune_config;
> $$ language sql stable;
>
> -- rule action runs first, updating prune_config before the
> -- original statement's pruning would normally be evaluated
> create rule config_upd_rule as on update to multistmt_pt
> do also update prune_config set val = 2;
>
> set plan_cache_mode to force_generic_plan;
> prepare multi_q as
> update multistmt_pt set b = b + 1 where a = get_prune_val();
> execute multi_q; -- creates the generic plan
>
> -- reset for the real test
> update prune_config set val = 1;
> update multistmt_pt set b = 0;
>
> -- second execute reuses the plan
> execute multi_q;
> select * from multistmt_pt order by a;
>
> Without the patch: the rule action updates prune_config to val=2
> first, then after CCI the original statement's initial pruning calls
> get_prune_val(), gets 2, prunes to multistmt_pt_2, and updates it
> correctly: (1, 0), (2, 1).
>
> With the patch as it stood: both statements' pruning runs during
> GetCachedPlan() before either executes. The original statement's
> pruning sees val=1, prunes to multistmt_pt_1, and multistmt_pt_2 is
> never touched.
>
> The fix is to skip pruning-aware locking for CachedPlans containing
> multiple PlannedStmts, falling back to locking all partitions.
> Single-statement plans are unchanged.
For good measure, I also verified that Tom's test case from last May
[1] that prompted the revert of the previous commit works correctly
with this patch. When the DO ALSO rule is created mid-execution, the
plan gets invalidated and rebuilt as a multi-statement CachedPlan,
which triggers the fallback to locking all partitions. No assertions,
no crashes.
--
Thanks, Amit Langote
[1] https://postgr.es/m/[email protected]
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-04-04 12:10 ` Amit Langote <[email protected]>
2026-05-27 12:03 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-04-04 12:10 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Attached is a redesigned version. While working on the previous
design, I grew increasingly uncomfortable with CachedPlanPrepData --
it was smuggling executor state out of GetCachedPlan() through an
out-parameter, which papered over the real problem: GetCachedPlan()
was doing too much. The main change in this version is architectural:
GetCachedPlan() no longer acquires execution locks. Callers now own
that responsibility, which is natural because each call site iterates
stmt_list differently and manages execution state in its own way --
and it lets them choose between conservative lock-all and
pruning-aware locking where appropriate.
Non-portal call sites remain on the conservative path for now.
_SPI_execute_plan requires care around snapshot setup, which happens
after plan fetch rather than before. SQL functions have a different
issue: init_execution_state() fetches the plan while postquel_start()
handles execution, with execution_state containers in between, making
it harder to thread a prepped QueryDesc through. The portal path and
EXPLAIN EXECUTE cover the most common
prepared-statement-with-partitions workloads; the remaining sites can
be converted incrementally.
This is now starting to feel closer to what Tom suggested back in
January 2023 [1], where he proposed getting rid of
AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
lock acquisition out to callers. He noted that "we'd be pushing the
responsibility for looping back and re-planning out to fairly
high-level calling code" and that "we'd definitely be changing some
fundamental APIs." That is the direction I came around to over the
last couple of weeks while wrestling with CachedPlanPrepData. The
reverted approach also tried to follow Tom's direction but moved
locking into ExecutorStart(), which forced it to handle plan
invalidation from inside the executor by mutating the CachedPlan
in-place. This version moves locking out to the callers instead, so
the executor and plan cache never reach into each other.
The series is now four patches:
0001: Move execution lock acquisition out of GetCachedPlan(). Adds
AcquireExecutorLocks() as a caller-facing function with validity check
and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
portal retry logic. All callers are converted. No behavioral change.
0002: Refactor executor's initial partition pruning setup. Cleanup
only, no behavioral change.
0003: Introduce ExecutorPrep() and refactor executor startup. Factors
range table init, permission checks, and initial pruning out of
InitPlan(). Scaffolding for 0004; all callers still go through the
normal ExecutorStart() path.
0004: Use pruning-aware locking for single-statement cached plans.
Adds ExecutorPrepAndLock() which locks unprunable relations, runs
ExecutorPrep() to determine surviving partitions, then locks only
those. Extends PortalLockCachedPlan() with a pruning-aware path for
eligible plans. Multi-statement CachedPlans (from rule rewriting)
always use conservative locking. In principle, this could be relaxed
if the planner can prove that no pruning expression reads state
modified by an earlier statement, but that is left for a future patch.
Includes regression tests.
In case it's not clear, I'm not targeting v19 at this point. I'd like
to get this into v20 CF1 and would welcome review from anyone
interested.
--
Thanks,
Amit Langote
[1] https://www.postgresql.org/message-id/4191508.1674157166%40sss.pgh.pa.us
Attachments:
[application/octet-stream] v11-0004-Use-pruning-aware-locking-for-single-statement-c.patch (40.3K, 2-v11-0004-Use-pruning-aware-locking-for-single-statement-c.patch)
download | inline diff:
From f586635ab49f3027546a7bda4c4f6017b946f333 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 20:43:14 +0900
Subject: [PATCH v11 4/4] Use pruning-aware locking for single-statement cached
plans
For single-statement reused generic plans, perform initial partition
pruning before acquiring execution locks, then lock only the
surviving partitions.
Add ExecutorPrepAndLock() which encapsulates the pruning-aware lock
sequence: lock unprunable relations, call ExecutorPrep() to run
initial pruning, then lock survivors. Plan validity is checked
after each step; ExecutorPrepCleanup() handles the case where the
plan is invalidated between prep and execution.
Extend PortalLockCachedPlan() to use the pruning-aware path for
eligible plans (single-statement reused generic, non-utility).
All other cases continue using the conservative lock-all path
from the previous commit.
Track firstResultRels in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving ExecInitModifyTable()
assumptions about the first result relation being available.
Multi-statement CachedPlans (from rule rewriting) always use
conservative locking, since PortalRunMulti() executes statements
sequentially with CCI between them and later statements' pruning
expressions may depend on earlier ones' effects. In principle,
this could be relaxed if the planner can prove that no pruning
expression reads state modified by an earlier statement, but that
is left for a future patch.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning results.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this
was an optimization; it is now a correctness requirement, because
pruned partitions may not be locked.
---
src/backend/commands/explain.c | 45 +++--
src/backend/commands/prepare.c | 30 ++-
src/backend/executor/execMain.c | 142 ++++++++++++++
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/pquery.c | 54 ++++-
src/backend/utils/cache/plancache.c | 16 ++
src/include/commands/explain.h | 3 +-
src/include/executor/executor.h | 4 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 2 +
src/test/regress/expected/partition_prune.out | 184 ++++++++++++++++++
src/test/regress/expected/plancache.out | 63 ++++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++++
src/test/regress/sql/plancache.sql | 52 +++++
17 files changed, 720 insertions(+), 28 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e4b70166b0e..60cd912ace1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -374,7 +374,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ NULL);
}
/*
@@ -498,7 +499,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters)
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -527,13 +529,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
/*
* We discard the output if we have no use for it. If we're explaining
* CREATE TABLE AS, we'd better use the appropriate tuple receiver, while
@@ -549,10 +544,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
else
dest = None_Receiver;
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ /*
+ * Create a QueryDesc for the query, or use the one provided by the
+ * caller. When reusing a prep QueryDesc, its snapshot was set at
+ * creation time; we push it as active for ExecutorStart and override the
+ * destination and instrument options, which were not known when the
+ * caller created it.
+ */
+ if (prep_qd)
+ {
+ PushActiveSnapshot(GetActiveSnapshot());
+ queryDesc = prep_qd;
+ Assert(queryDesc->dest == None_Receiver);
+ queryDesc->dest = dest;
+ queryDesc->instrument_options = instrument_option;
+ }
+ else
+ {
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv,
+ instrument_option);
+ }
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 03d7a98fc58..3bbbc052149 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -588,6 +588,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ QueryDesc *prep_qd = NULL;
if (es->memory)
{
@@ -640,8 +641,31 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
pstate->p_queryEnv);
plan_list = cplan->stmt_list;
- if (AcquireExecutorLocks(cplan))
+ if (!CachedPlanCanPrep(cplan, entry->plansource))
+ {
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ continue;
+ }
+
+ prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, plan_list),
+ query_string,
+ GetActiveSnapshot(),
+ InvalidSnapshot,
+ None_Receiver, /* ExplainOnePlan will fix */
+ paramLI,
+ pstate->p_queryEnv,
+ 0 /* ExplainOnePlan will fix */ );
+ if (ExecutorPrepAndLock(prep_qd,
+ CurrentResourceOwner,
+ es->generic ? EXEC_FLAG_EXPLAIN_GENERIC : 0,
+ &cplan->is_valid))
break;
+
+ /* Try again. */
+ ExecutorPrepCleanup(prep_qd);
+ FreeQueryDesc(prep_qd);
ReleaseCachedPlan(cplan, CurrentResourceOwner);
}
@@ -664,6 +688,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(prep_qd == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
@@ -671,7 +696,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ prep_qd);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 735c80e08a9..7333c0f66d5 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -324,6 +324,124 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * LockRangeTableRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRangeTableRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not fail
+ * if it's been dropped entirely --- we'll just transiently acquire a
+ * non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksPrepared
+ *
+ * Acquire or release execution locks using pruning results already computed
+ * by ExecutorPrep() and stored in queryDesc->estate.
+ *
+ * This is intended for single-statement reused generic-plan paths that
+ * choose pruning-aware locking instead of the conservative
+ * AcquireExecutorLocks() path.
+ */
+static void
+AcquireExecutorLocksPrepared(QueryDesc *queryDesc, bool acquire)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ Bitmapset *lock_relids;
+ ListCell *lc;
+
+ Assert(queryDesc != NULL);
+ Assert(estate != NULL);
+ Assert(plannedstmt != NULL);
+ Assert(plannedstmt->commandType != CMD_UTILITY);
+
+ lock_relids = bms_difference(estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * Keep the first result relation of each ModifyTable locked even if
+ * pruning removed all target partitions. ExecInitModifyTable() relies on
+ * one such relation remaining available.
+ */
+ foreach(lc, plannedstmt->firstResultRels)
+ {
+ Index rti = lfirst_int(lc);
+
+ lock_relids = bms_add_member(lock_relids, rti);
+ }
+
+ LockRangeTableRelids(plannedstmt->rtable, lock_relids, acquire);
+
+ bms_free(lock_relids);
+
+}
+
+/*
+ * ExecutorPrepAndLock
+ * Perform pruning-aware locking for a single PlannedStmt.
+ *
+ * Locks unprunable relations first, then runs ExecutorPrep() to
+ * determine which partitions survive initial pruning, then locks
+ * only those survivors. Checks *is_valid after each locking step
+ * to detect plan invalidation (e.g., from concurrent DDL or DDL
+ * triggered by a pruning expression).
+ *
+ * Returns true if the plan is still valid and all needed locks are
+ * held. Returns false if the plan was invalidated at any point, in
+ * which case all acquired locks have been released and the caller
+ * should discard the QueryDesc and retry with a fresh plan.
+ */
+bool
+ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid)
+{
+ PlannedStmt *pstmt = queryDesc->plannedstmt;
+
+ /* Lock unprunable rels before pruning can access them. */
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, true);
+ if (!*is_valid)
+ {
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ /* Run pruning and lock survivors. */
+ ExecutorPrep(queryDesc, owner, eflags);
+ AcquireExecutorLocksPrepared(queryDesc, true);
+ if (!*is_valid)
+ {
+ AcquireExecutorLocksPrepared(queryDesc, false);
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ return true;
+}
+
/*
* ExecutorPrep
*
@@ -382,6 +500,30 @@ ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
CurrentResourceOwner = oldowner;
}
+/*
+ * ExecutorPrepCleanup
+ * Clean up an EState that was created by ExecutorPrep() but never
+ * passed to ExecutorStart(). This happens when the plan is
+ * invalidated between prep and execution, and the caller must
+ * discard the prepped state before retrying with a fresh plan.
+ *
+ * Unlike ExecutorEnd(), this does not expect a fully initialized
+ * plan state tree -- only the range table relations and the
+ * EState itself need to be freed.
+ */
+void
+ExecutorPrepCleanup(QueryDesc *queryDesc)
+{
+ EState *estate = queryDesc->estate;
+
+ if (estate == NULL)
+ return;
+
+ ExecCloseRangeTableRelations(estate);
+ FreeExecutorState(estate);
+ queryDesc->estate = NULL;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index dfd7b33aa9b..8bc5c36e09d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -5112,8 +5112,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -5127,6 +5127,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4ec76ce31a9..ace1cbacc91 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ff0e875f2a2..6ee51f06920 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of initially
+ * prunable relations. We use bms_next_member() to get the
+ * lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition expansion
+ * preserves RT index order. ExecInitModifyTable() asserts that the
+ * recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 1b22515d56e..af732821139 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,7 +59,9 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
-static bool PortalLockCachedPlan(Portal portal);
+static bool PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **queryDesc_p);
/*
@@ -492,9 +494,14 @@ restart:
* the destination to DestNone.
*
* If the portal is backed by a cached plan, acquire execution
- * locks via PortalLockCachedPlan(). If the plan is
- * invalidated during locking, it replans and may change the
- * portal strategy, requiring us to restart PortalStart().
+ * locks via PortalLockCachedPlan(). For eligible plans
+ * (single-statement reused generic), this performs
+ * pruning-aware locking: it runs ExecutorPrep() on the
+ * QueryDesc to determine which partitions survive initial
+ * pruning, then locks only those. If the plan is invalidated
+ * during this process, it replans and rebuilds the QueryDesc.
+ * If replanning changes the portal strategy, we must restart
+ * PortalStart() to redispatch.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -506,7 +513,7 @@ restart:
0);
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, true, params, &queryDesc))
{
PopActiveSnapshot();
goto restart;
@@ -552,7 +559,7 @@ restart:
case PORTAL_ONE_MOD_WITH:
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -608,7 +615,7 @@ restart:
*/
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -1825,15 +1832,32 @@ EnsurePortalSnapshotExists(void)
* Acquire execution locks for a cached-plan-backed portal,
* retrying with a fresh plan if the current one is invalidated.
*
+ * If do_prep is true and the plan is eligible (single-statement reused
+ * generic plan), performs pruning-aware locking via ExecutorPrep() and
+ * populates portal->queryDesc with the prepped QueryDesc. Otherwise
+ * falls back to locking all relations in the plan.
+ *
* Returns true if replanning changed portal->strategy, meaning the
- * caller must redispatch. Returns false once locks are held.
+ * caller must redispatch. Returns false once locks are held and the
+ * plan is valid for execution.
*/
static bool
-PortalLockCachedPlan(Portal portal)
+PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **prep_qd)
{
PortalStrategy start_strategy = portal->strategy;
- if (AcquireExecutorLocks(portal->cplan))
+ if (do_prep && CachedPlanCanPrep(portal->cplan, portal->plansource))
+ {
+ Assert(prep_qd);
+ if (ExecutorPrepAndLock(*prep_qd, portal->resowner, 0,
+ &portal->cplan->is_valid))
+ return false;
+ ExecutorPrepCleanup(*prep_qd);
+ FreeQueryDesc(*prep_qd);
+ }
+ else if (AcquireExecutorLocks(portal->cplan))
return false;
/* Replan. Locks will be taken freshly. */
@@ -1849,5 +1873,15 @@ PortalLockCachedPlan(Portal portal)
if (portal->strategy != start_strategy)
return true;
+ if (prep_qd)
+ {
+ Assert(list_length(portal->stmts) == 1);
+ *prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->sourceText,
+ GetActiveSnapshot(), InvalidSnapshot,
+ None_Receiver, params,
+ portal->queryEnv, 0);
+ }
+
return false;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f7fe366859c..fca2f84081e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1977,6 +1977,22 @@ AcquireExecutorLocks(CachedPlan *cplan)
return true;
}
+/*
+ * CachedPlanCanPrep
+ * Check whether a cached plan is eligible for pruning-aware locking
+ * via ExecutorPrepAndLock().
+ *
+ * Only single-statement reused generic plans with a non-utility command
+ * qualify.
+ */
+bool
+CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource)
+{
+ return (cplan == plansource->gplan &&
+ list_length(cplan->stmt_list) == 1 &&
+ linitial_node(PlannedStmt, cplan->stmt_list)->commandType != CMD_UTILITY);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..3a03355e6b6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -69,7 +69,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters);
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 491c4886506..fef5aadcdfa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid);
+extern void ExecutorPrepCleanup(QueryDesc *queryDesc);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 693b879f76d..8753e05152b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 14a1dfed2b9..7f6f7cda781 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -120,6 +120,16 @@ typedef struct PlannedStmt
/* RT indexes of relations targeted by INSERT/UPDATE/DELETE/MERGE */
Bitmapset *resultRelationRelids;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e0fc403e717..2941d3a301b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -254,4 +254,6 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+extern bool CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource);
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..61781389d2f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,187 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..3043dbfac2d 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..692415a8d9f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,119 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..6a8b8787de6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v11-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (8.9K, 3-v11-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 1b9f7861d7162f5b20f69ea9db5dda13f64c202e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v11 3/4] Introduce ExecutorPrep and refactor executor startup
Move permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper.
ExecutorStart() invokes ExecutorPrep() when QueryDesc->estate is
NULL, keeping current behavior unchanged. If QueryDesc->estate is
already set, ExecutorStart() reuses it.
This is preparatory refactoring only. No caller outside the
executor supplies a prebuilt EState in this commit.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/executor/README | 10 ++-
src/backend/executor/execMain.c | 152 ++++++++++++++++++++++++++------
src/include/executor/execdesc.h | 2 +-
3 files changed, 132 insertions(+), 32 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..890bc3d9333 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart, or implicitly from ExecutorStart
+ if not done earlier. Creates the EState in QueryDesc, performs
+ range table initialization, permission checks, and initial
+ partition pruning.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if QueryDesc.estate is NULL)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 45e00c6af85..735c80e08a9 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -76,6 +77,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
+static void ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags);
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
@@ -147,7 +149,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +174,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When no
+ * prep EState was provided, AcquireExecutorLocks() should have locked
+ * every relation in the plan. When one was provided, pruning-aware
+ * locking should have locked at least the unpruned relations. Both
+ * checks are skipped in parallel workers, which acquire relation locks
+ * lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ ExecutorPrep(queryDesc, CurrentResourceOwner, eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking should
+ * have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +324,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep
+ *
+ * Build the initial executor state for queryDesc before ExecutorStart().
+ *
+ * This creates the EState and performs the subset of executor startup that
+ * does not require plan-tree initialization, allowing that work to be reused
+ * by callers that need executor state before ExecutorStart():
+ *
+ * - initialize the range table
+ * - perform permission checks
+ * - perform initial partition pruning
+ *
+ * On success, queryDesc->estate is set and can later be reused by
+ * ExecutorStart() instead of rebuilding the same state.
+ *
+ * Caller must ensure that queryDesc->snapshot is active.
+ */
+static void
+ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ PlannedStmt *pstmt;
+
+ Assert(queryDesc != NULL);
+
+ if (queryDesc->operation == CMD_UTILITY)
+ return;
+
+ Assert(ActiveSnapshotSet());
+ Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ Assert(queryDesc->estate == NULL);
+
+ pstmt = queryDesc->plannedstmt;
+
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = queryDesc->params;
+ estate->es_queryEnv = queryDesc->queryEnv;
+ estate->es_top_eflags = eflags;
+
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,37 +957,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..27697760bb9 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
--
2.47.3
[application/octet-stream] v11-0001-Move-execution-lock-acquisition-out-of-GetCached.patch (16.4K, 4-v11-0001-Move-execution-lock-acquisition-out-of-GetCached.patch)
download | inline diff:
From 8dc44320c7d4b20f50200d7b21c98e4058b8d6d7 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 18:38:34 +0900
Subject: [PATCH v11 1/4] Move execution lock acquisition out of
GetCachedPlan()
GetCachedPlan() previously acquired execution locks on all plan
relations as part of cached plan validation. Move this
responsibility to callers, making GetCachedPlan() return a valid
plan without holding execution locks.
Add AcquireExecutorLocks() as the caller-facing function: it locks
all relations in the plan, checks that the plan is still valid
afterward, and returns false if it was invalidated so the caller
can retry with a fresh plan.
For portal-backed callers, add PortalLockCachedPlan() in pquery.c
which wraps the lock-check-retry loop and handles the case where
replanning changes the portal strategy. Store the CachedPlanSource
pointer in PortalData so retry can call GetCachedPlan() without
the caller threading it through.
Adjust all non-portal GetCachedPlan() callers (SPI, EXPLAIN
EXECUTE, SQL functions) to call AcquireExecutorLocks() explicitly
after fetching the plan.
No behavioral change. This separates plan retrieval from execution
setup, allowing a later commit to substitute pruning-aware locking
for eligible plans.
---
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 14 +++++-
src/backend/executor/functions.c | 14 ++++--
src/backend/executor/spi.c | 22 ++++++++--
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 68 ++++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 44 ++++++++++++++-----
src/backend/utils/mmgr/portalmem.c | 7 +++
src/include/utils/plancache.h | 1 +
src/include/utils/portal.h | 3 ++
10 files changed, 155 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..03d7a98fc58 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ entry->plansource,
cplan);
/*
@@ -632,8 +633,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ for (;;)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner,
+ pstate->p_queryEnv);
+ plan_list = cplan->stmt_list;
+
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ }
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..2afb814a435 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -654,6 +654,7 @@ static bool
init_execution_state(SQLFunctionCachePtr fcache)
{
CachedPlanSource *plansource;
+ CachedPlan *cplan;
execution_state *preves = NULL;
execution_state *lasttages = NULL;
int nstmts;
@@ -696,10 +697,15 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
- fcache->cplan = GetCachedPlan(plansource,
- fcache->paramLI,
- fcache->cowner,
- NULL);
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, fcache->paramLI,
+ fcache->cowner, NULL);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, fcache->cowner);
+ }
+ fcache->cplan = cplan;
/*
* If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..268cd10bde8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ plansource,
cplan);
/*
@@ -2106,6 +2107,16 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
_SPI_current->queryEnv);
Assert(cplan == plansource->gplan);
+ if (!AcquireExecutorLocks(cplan))
+ {
+ /* Plan invalidated during locking; get a fresh one. */
+ ReleaseCachedPlan(cplan,
+ plan->saved ? CurrentResourceOwner : NULL);
+ cplan = GetCachedPlan(plansource, NULL,
+ plan->saved ? CurrentResourceOwner : NULL,
+ _SPI_current->queryEnv);
+ }
+
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2574,9 +2585,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
- cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, options->params,
+ plan_owner, _SPI_current->queryEnv);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, plan_owner);
+ }
stmt_list = cplan->stmt_list;
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 10be60011ad..aaebefcdf7a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ psrc,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..1b22515d56e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,6 +59,7 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
+static bool PortalLockCachedPlan(Portal portal);
/*
@@ -462,6 +463,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
portal->strategy = ChoosePortalStrategy(portal->stmts);
+restart:
+
/*
* Fire her up according to the strategy
*/
@@ -487,6 +490,11 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). If the plan is
+ * invalidated during locking, it replans and may change the
+ * portal strategy, requiring us to restart PortalStart().
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -496,6 +504,14 @@ PortalStart(Portal portal, ParamListInfo params,
params,
portal->queryEnv,
0);
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
/*
* If it's a scrollable cursor, executor needs to support
@@ -534,6 +550,11 @@ PortalStart(Portal portal, ParamListInfo params,
case PORTAL_ONE_RETURNING:
case PORTAL_ONE_MOD_WITH:
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
/*
* We don't start the executor until we are told to run the
@@ -577,7 +598,20 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+
+ /*
+ * GetCachedPlan() no longer acquires execution locks, so we
+ * must do it here. Multi-statement plans always use
+ * conservative locking (all partitions locked); pruning-aware
+ * locking is not feasible because PortalRunMulti() executes
+ * statements sequentially with CCI between them.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -1785,3 +1819,35 @@ EnsurePortalSnapshotExists(void)
/* PushActiveSnapshotWithLevel might have copied the snapshot */
portal->portalSnapshot = GetActiveSnapshot();
}
+
+/*
+ * PortalLockCachedPlan
+ * Acquire execution locks for a cached-plan-backed portal,
+ * retrying with a fresh plan if the current one is invalidated.
+ *
+ * Returns true if replanning changed portal->strategy, meaning the
+ * caller must redispatch. Returns false once locks are held.
+ */
+static bool
+PortalLockCachedPlan(Portal portal)
+{
+ PortalStrategy start_strategy = portal->strategy;
+
+ if (AcquireExecutorLocks(portal->cplan))
+ return false;
+
+ /* Replan. Locks will be taken freshly. */
+ ReleaseCachedPlan(portal->cplan, portal->resowner);
+ portal->cplan = NULL;
+ portal->stmts = NIL;
+ portal->cplan = GetCachedPlan(portal->plansource,
+ portal->portalParams,
+ portal->resowner,
+ portal->queryEnv);
+ portal->stmts = portal->cplan->stmt_list;
+ portal->strategy = ChoosePortalStrategy(portal->stmts);
+ if (portal->strategy != start_strategy)
+ return true;
+
+ return false;
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..f7fe366859c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,7 +100,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksInt(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +945,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan. Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +984,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
-
/*
* If plan was transient, check to see if TransactionXmin has
* advanced, and if so invalidate it.
@@ -1003,9 +1002,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
/* Successfully revalidated and locked the query. */
return true;
}
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1282,8 +1278,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but no execution locks are held.
+ * The caller must call AcquireExecutorLocks() before executing.
+ * For freshly built plans (custom or new generic), the planner
+ * already holds the needed locks, so AcquireExecutorLocks() is
+ * redundant but harmless.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1906,9 +1905,11 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksInt(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1956,27 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * AcquireExecutorLocks
+ * Acquire execution locks on all relations in a cached plan.
+ *
+ * Returns true if the plan is still valid after locking. Returns
+ * false if the plan was invalidated while locks were being acquired,
+ * in which case the locks have been released and the caller should
+ * discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+bool
+AcquireExecutorLocks(CachedPlan *cplan)
+{
+ AcquireExecutorLocksInt(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocksInt(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..613f3be30b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,10 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If plansource is provided, it is the CachedPlanSource that produced
+ * cplan. PortalLockCachedPlan() uses it to fetch a fresh plan if the
+ * current one is invalidated during execution lock acquisition.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +290,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +304,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->plansource = plansource;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
@@ -517,6 +523,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* drop cached plan reference, if any */
PortalReleaseCachedPlan(portal);
+ portal->plansource = NULL;
/*
* If portal has a snapshot protecting its data, release that. This needs
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..e0fc403e717 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool AcquireExecutorLocks(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..3af535362cd 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,8 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ CachedPlanSource *plansource; /* CachedPlanSource, for replanning on
+ * invalidation */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +242,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v11-0002-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 5-v11-0002-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From ddc05ba324ab0347b2219ead1740a14617029f30 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v11 2/4] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-05-27 12:03 ` Thom Brown <[email protected]>
2026-05-28 08:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Thom Brown @ 2026-05-27 12:03 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Sat, 4 Apr 2026 at 13:11, Amit Langote <[email protected]> wrote:
>
> Attached is a redesigned version. While working on the previous
> design, I grew increasingly uncomfortable with CachedPlanPrepData --
> it was smuggling executor state out of GetCachedPlan() through an
> out-parameter, which papered over the real problem: GetCachedPlan()
> was doing too much. The main change in this version is architectural:
> GetCachedPlan() no longer acquires execution locks. Callers now own
> that responsibility, which is natural because each call site iterates
> stmt_list differently and manages execution state in its own way --
> and it lets them choose between conservative lock-all and
> pruning-aware locking where appropriate.
>
> Non-portal call sites remain on the conservative path for now.
> _SPI_execute_plan requires care around snapshot setup, which happens
> after plan fetch rather than before. SQL functions have a different
> issue: init_execution_state() fetches the plan while postquel_start()
> handles execution, with execution_state containers in between, making
> it harder to thread a prepped QueryDesc through. The portal path and
> EXPLAIN EXECUTE cover the most common
> prepared-statement-with-partitions workloads; the remaining sites can
> be converted incrementally.
>
> This is now starting to feel closer to what Tom suggested back in
> January 2023 [1], where he proposed getting rid of
> AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> lock acquisition out to callers. He noted that "we'd be pushing the
> responsibility for looping back and re-planning out to fairly
> high-level calling code" and that "we'd definitely be changing some
> fundamental APIs." That is the direction I came around to over the
> last couple of weeks while wrestling with CachedPlanPrepData. The
> reverted approach also tried to follow Tom's direction but moved
> locking into ExecutorStart(), which forced it to handle plan
> invalidation from inside the executor by mutating the CachedPlan
> in-place. This version moves locking out to the callers instead, so
> the executor and plan cache never reach into each other.
>
> The series is now four patches:
>
> 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> AcquireExecutorLocks() as a caller-facing function with validity check
> and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> portal retry logic. All callers are converted. No behavioral change.
>
> 0002: Refactor executor's initial partition pruning setup. Cleanup
> only, no behavioral change.
>
> 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> range table init, permission checks, and initial pruning out of
> InitPlan(). Scaffolding for 0004; all callers still go through the
> normal ExecutorStart() path.
>
> 0004: Use pruning-aware locking for single-statement cached plans.
> Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> ExecutorPrep() to determine surviving partitions, then locks only
> those. Extends PortalLockCachedPlan() with a pruning-aware path for
> eligible plans. Multi-statement CachedPlans (from rule rewriting)
> always use conservative locking. In principle, this could be relaxed
> if the planner can prove that no pruning expression reads state
> modified by an earlier statement, but that is left for a future patch.
> Includes regression tests.
>
> In case it's not clear, I'm not targeting v19 at this point. I'd like
> to get this into v20 CF1 and would welcome review from anyone
> interested.
After not having looked at this in close to 2 years, I thought I'd
give it another look. Not found any user-facing issues, and I'm liking
seeing so few locks in pg_locks. I can see that with pruning disabled,
the fallback works, pruning-aware locking is working via SPI through
plpgsql, running ALTER between executions and also invalidating
indexes force replans, and it's looking good.
But I also think there might be a bug in patch 0001, but I'd
appreciate checking my reasoning because I'm not fully confident I've
been diligent enough.
When PortalStart() opens a SELECT cursor that's backed by a cached
plan, it does roughly the following. It builds a queryDesc (an
executor-side struct), one of whose fields is a pointer into the plan
tree inside the portal's cached plan. Then it calls
PortalLockCachedPlan() to acquire the necessary locks, and finally
hands the queryDesc over to the executor.
My worry is about what happens if the cached plan turns out to be
stale, for instance because someone ran DDL on a referenced table. In
that case PortalLockCachedPlan() throws the old plan away (via
ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
the portal's own pointers to match. But the queryDesc from earlier
isn't touched. Its plan pointer still references the old, now-released
plan. From what I can see, once that old plan's last reference is
dropped its memory can be freed, which would leave the executor
reading from freed memory in the next step.
The bit I'm least sure about is whether the old plan's memory really
does get reclaimed straight away when its refcount hits zero. If
something keeps it alive longer then this isn't a bug, or at least not
as bad as I'm making out. I had a look but couldn't convince myself
either way from the code alone. To actually hit this you'd need a
cursor on a cached plan, plus an invalidation arriving in the small
window between the portal being set up and the cursor being opened.
The race condition is brief, and I've not been able to hit it in
testing.
The thing that got me thinking this is real: patch 0004 modifies
PortalLockCachedPlan() so that whenever it replans, it also rebuilds
the queryDesc. That's pretty much the fix I'd expect for this, which
makes me suspect somebody hit it at some point. But 0004 only applies
that fix on the new pruning-aware code path, and it was mentioned in
the thread that 0001 to 0003 might land before 0004. If so, master
would carry the bug in the gap between the two.
I suspect a way to deal with it would be to move the CreateQueryDesc
call in the SELECT case to after PortalLockCachedPlan() returns, which
is what the other portal strategies already seem to do. Alternatively,
you could bring 0004's changes in this area into 0001 and have
PortalLockCachedPlan() always rebuild the queryDesc when it replans.
If I've got this wrong and there's some lifetime mechanism I missed
that keeps the old plan's memory alive, then it's a non-issue and I'm
misreading the code. If I have got it wrong, could you please add
comments to make what is going on clearer?
Regards
Thom
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-27 12:03 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
@ 2026-05-28 08:13 ` Amit Langote <[email protected]>
2026-05-28 13:13 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-05-28 08:13 UTC (permalink / raw)
To: Thom Brown <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
Hi Thom,
On Wed, May 27, 2026 at 9:03 PM Thom Brown <[email protected]> wrote:
>
> On Sat, 4 Apr 2026 at 13:11, Amit Langote <[email protected]> wrote:
> >
> > Attached is a redesigned version. While working on the previous
> > design, I grew increasingly uncomfortable with CachedPlanPrepData --
> > it was smuggling executor state out of GetCachedPlan() through an
> > out-parameter, which papered over the real problem: GetCachedPlan()
> > was doing too much. The main change in this version is architectural:
> > GetCachedPlan() no longer acquires execution locks. Callers now own
> > that responsibility, which is natural because each call site iterates
> > stmt_list differently and manages execution state in its own way --
> > and it lets them choose between conservative lock-all and
> > pruning-aware locking where appropriate.
> >
> > Non-portal call sites remain on the conservative path for now.
> > _SPI_execute_plan requires care around snapshot setup, which happens
> > after plan fetch rather than before. SQL functions have a different
> > issue: init_execution_state() fetches the plan while postquel_start()
> > handles execution, with execution_state containers in between, making
> > it harder to thread a prepped QueryDesc through. The portal path and
> > EXPLAIN EXECUTE cover the most common
> > prepared-statement-with-partitions workloads; the remaining sites can
> > be converted incrementally.
> >
> > This is now starting to feel closer to what Tom suggested back in
> > January 2023 [1], where he proposed getting rid of
> > AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> > lock acquisition out to callers. He noted that "we'd be pushing the
> > responsibility for looping back and re-planning out to fairly
> > high-level calling code" and that "we'd definitely be changing some
> > fundamental APIs." That is the direction I came around to over the
> > last couple of weeks while wrestling with CachedPlanPrepData. The
> > reverted approach also tried to follow Tom's direction but moved
> > locking into ExecutorStart(), which forced it to handle plan
> > invalidation from inside the executor by mutating the CachedPlan
> > in-place. This version moves locking out to the callers instead, so
> > the executor and plan cache never reach into each other.
> >
> > The series is now four patches:
> >
> > 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> > AcquireExecutorLocks() as a caller-facing function with validity check
> > and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> > portal retry logic. All callers are converted. No behavioral change.
> >
> > 0002: Refactor executor's initial partition pruning setup. Cleanup
> > only, no behavioral change.
> >
> > 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> > range table init, permission checks, and initial pruning out of
> > InitPlan(). Scaffolding for 0004; all callers still go through the
> > normal ExecutorStart() path.
> >
> > 0004: Use pruning-aware locking for single-statement cached plans.
> > Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> > ExecutorPrep() to determine surviving partitions, then locks only
> > those. Extends PortalLockCachedPlan() with a pruning-aware path for
> > eligible plans. Multi-statement CachedPlans (from rule rewriting)
> > always use conservative locking. In principle, this could be relaxed
> > if the planner can prove that no pruning expression reads state
> > modified by an earlier statement, but that is left for a future patch.
> > Includes regression tests.
> >
> > In case it's not clear, I'm not targeting v19 at this point. I'd like
> > to get this into v20 CF1 and would welcome review from anyone
> > interested.
>
> After not having looked at this in close to 2 years, I thought I'd
> give it another look.
Thanks for taking a look.
> Not found any user-facing issues, and I'm liking
> seeing so few locks in pg_locks. I can see that with pruning disabled,
> the fallback works, pruning-aware locking is working via SPI through
> plpgsql, running ALTER between executions and also invalidating
> indexes force replans, and it's looking good.
>
> But I also think there might be a bug in patch 0001, but I'd
> appreciate checking my reasoning because I'm not fully confident I've
> been diligent enough.
>
> When PortalStart() opens a SELECT cursor that's backed by a cached
> plan, it does roughly the following. It builds a queryDesc (an
> executor-side struct), one of whose fields is a pointer into the plan
> tree inside the portal's cached plan. Then it calls
> PortalLockCachedPlan() to acquire the necessary locks, and finally
> hands the queryDesc over to the executor.
>
> My worry is about what happens if the cached plan turns out to be
> stale, for instance because someone ran DDL on a referenced table. In
> that case PortalLockCachedPlan() throws the old plan away (via
> ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
> the portal's own pointers to match. But the queryDesc from earlier
> isn't touched. Its plan pointer still references the old, now-released
> plan. From what I can see, once that old plan's last reference is
> dropped its memory can be freed, which would leave the executor
> reading from freed memory in the next step.
>
> The bit I'm least sure about is whether the old plan's memory really
> does get reclaimed straight away when its refcount hits zero. If
> something keeps it alive longer then this isn't a bug, or at least not
> as bad as I'm making out. I had a look but couldn't convince myself
> either way from the code alone. To actually hit this you'd need a
> cursor on a cached plan, plus an invalidation arriving in the small
> window between the portal being set up and the cursor being opened.
> The race condition is brief, and I've not been able to hit it in
> testing.
>
> The thing that got me thinking this is real: patch 0004 modifies
> PortalLockCachedPlan() so that whenever it replans, it also rebuilds
> the queryDesc. That's pretty much the fix I'd expect for this, which
> makes me suspect somebody hit it at some point. But 0004 only applies
> that fix on the new pruning-aware code path, and it was mentioned in
> the thread that 0001 to 0003 might land before 0004. If so, master
> would carry the bug in the gap between the two.
>
> I suspect a way to deal with it would be to move the CreateQueryDesc
> call in the SELECT case to after PortalLockCachedPlan() returns, which
> is what the other portal strategies already seem to do. Alternatively,
> you could bring 0004's changes in this area into 0001 and have
> PortalLockCachedPlan() always rebuild the queryDesc when it replans.
>
> If I've got this wrong and there's some lifetime mechanism I missed
> that keeps the old plan's memory alive, then it's a non-issue and I'm
> misreading the code. If I have got it wrong, could you please add
> comments to make what is going on clearer?
It's a real bug.
You're right that if PortalLockCachedPlan() replans, the QueryDesc
created before the call still points at the old PlannedStmt from the
released plan. And yes, 0004 happens to fix it by rebuilding the
QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
broken on their own.
Attached is an updated set with the fix: CreateQueryDesc now runs
after PortalLockCachedPlan() returns, as you suggested. That said,
I'll probably focus first on settling the plancache refactoring that
spun off from this thread [1], and then start a new thread for the
pruning-aware locking work on top of it, incorporating parts of this
series.
--
Thanks, Amit Langote
[1] https://www.postgresql.org/message-id/CA%2BHiwqE1ntHy2h9zJ9v3MwAkoGAveSERcHWkDTTZnP0kxWqbKQ%40mail.g...
Attachments:
[application/octet-stream] v12-0001-Move-execution-lock-acquisition-out-of-GetCached.patch (16.2K, 2-v12-0001-Move-execution-lock-acquisition-out-of-GetCached.patch)
download | inline diff:
From a3214580f2ce1983a111af07ccb092ba03c812c8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 18:38:34 +0900
Subject: [PATCH v12 1/4] Move execution lock acquisition out of
GetCachedPlan()
GetCachedPlan() previously acquired execution locks on all plan
relations as part of cached plan validation. Move this
responsibility to callers, making GetCachedPlan() return a valid
plan without holding execution locks.
Add AcquireExecutorLocks() as the caller-facing function: it locks
all relations in the plan, checks that the plan is still valid
afterward, and returns false if it was invalidated so the caller
can retry with a fresh plan.
For portal-backed callers, add PortalLockCachedPlan() in pquery.c
which wraps the lock-check-retry loop and handles the case where
replanning changes the portal strategy. Store the CachedPlanSource
pointer in PortalData so retry can call GetCachedPlan() without
the caller threading it through.
Adjust all non-portal GetCachedPlan() callers (SPI, EXPLAIN
EXECUTE, SQL functions) to call AcquireExecutorLocks() explicitly
after fetching the plan.
No behavioral change. This separates plan retrieval from execution
setup, allowing a later commit to substitute pruning-aware locking
for eligible plans.
---
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 14 +++++-
src/backend/executor/functions.c | 14 ++++--
src/backend/executor/spi.c | 22 +++++++--
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 70 ++++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 44 +++++++++++++-----
src/backend/utils/mmgr/portalmem.c | 7 +++
src/include/utils/plancache.h | 1 +
src/include/utils/portal.h | 3 ++
10 files changed, 157 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..03d7a98fc58 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ entry->plansource,
cplan);
/*
@@ -632,8 +633,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ for (;;)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner,
+ pstate->p_queryEnv);
+ plan_list = cplan->stmt_list;
+
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ }
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..2afb814a435 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -654,6 +654,7 @@ static bool
init_execution_state(SQLFunctionCachePtr fcache)
{
CachedPlanSource *plansource;
+ CachedPlan *cplan;
execution_state *preves = NULL;
execution_state *lasttages = NULL;
int nstmts;
@@ -696,10 +697,15 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
- fcache->cplan = GetCachedPlan(plansource,
- fcache->paramLI,
- fcache->cowner,
- NULL);
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, fcache->paramLI,
+ fcache->cowner, NULL);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, fcache->cowner);
+ }
+ fcache->cplan = cplan;
/*
* If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..268cd10bde8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ plansource,
cplan);
/*
@@ -2106,6 +2107,16 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
_SPI_current->queryEnv);
Assert(cplan == plansource->gplan);
+ if (!AcquireExecutorLocks(cplan))
+ {
+ /* Plan invalidated during locking; get a fresh one. */
+ ReleaseCachedPlan(cplan,
+ plan->saved ? CurrentResourceOwner : NULL);
+ cplan = GetCachedPlan(plansource, NULL,
+ plan->saved ? CurrentResourceOwner : NULL,
+ _SPI_current->queryEnv);
+ }
+
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2574,9 +2585,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
- cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, options->params,
+ plan_owner, _SPI_current->queryEnv);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, plan_owner);
+ }
stmt_list = cplan->stmt_list;
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dbef734a93f..2929f158338 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1243,6 +1243,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2042,6 +2043,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ psrc,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index ee731000820..4699b53cab7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,6 +59,7 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
+static bool PortalLockCachedPlan(Portal portal);
/*
@@ -463,6 +464,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
portal->strategy = ChoosePortalStrategy(portal->stmts);
+restart:
+
/*
* Fire her up according to the strategy
*/
@@ -485,6 +488,21 @@ PortalStart(Portal portal, ParamListInfo params,
* non-default nesting level for the snapshot.
*/
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). If the plan is
+ * invalidated during locking, it replans and may change the
+ * portal strategy, requiring us to restart PortalStart().
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -535,6 +553,11 @@ PortalStart(Portal portal, ParamListInfo params,
case PORTAL_ONE_RETURNING:
case PORTAL_ONE_MOD_WITH:
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
/*
* We don't start the executor until we are told to run the
@@ -578,7 +601,20 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+
+ /*
+ * GetCachedPlan() no longer acquires execution locks, so we
+ * must do it here. Multi-statement plans always use
+ * conservative locking (all partitions locked); pruning-aware
+ * locking is not feasible because PortalRunMulti() executes
+ * statements sequentially with CCI between them.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -1786,3 +1822,35 @@ EnsurePortalSnapshotExists(void)
/* PushActiveSnapshotWithLevel might have copied the snapshot */
portal->portalSnapshot = GetActiveSnapshot();
}
+
+/*
+ * PortalLockCachedPlan
+ * Acquire execution locks for a cached-plan-backed portal,
+ * retrying with a fresh plan if the current one is invalidated.
+ *
+ * Returns true if replanning changed portal->strategy, meaning the
+ * caller must redispatch. Returns false once locks are held.
+ */
+static bool
+PortalLockCachedPlan(Portal portal)
+{
+ PortalStrategy start_strategy = portal->strategy;
+
+ if (AcquireExecutorLocks(portal->cplan))
+ return false;
+
+ /* Replan. Locks will be taken freshly. */
+ ReleaseCachedPlan(portal->cplan, portal->resowner);
+ portal->cplan = NULL;
+ portal->stmts = NIL;
+ portal->cplan = GetCachedPlan(portal->plansource,
+ portal->portalParams,
+ portal->resowner,
+ portal->queryEnv);
+ portal->stmts = portal->cplan->stmt_list;
+ portal->strategy = ChoosePortalStrategy(portal->stmts);
+ if (portal->strategy != start_strategy)
+ return true;
+
+ return false;
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..f7fe366859c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,7 +100,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksInt(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +945,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan. Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +984,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
-
/*
* If plan was transient, check to see if TransactionXmin has
* advanced, and if so invalidate it.
@@ -1003,9 +1002,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
/* Successfully revalidated and locked the query. */
return true;
}
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1282,8 +1278,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but no execution locks are held.
+ * The caller must call AcquireExecutorLocks() before executing.
+ * For freshly built plans (custom or new generic), the planner
+ * already holds the needed locks, so AcquireExecutorLocks() is
+ * redundant but harmless.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1906,9 +1905,11 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksInt(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1956,27 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * AcquireExecutorLocks
+ * Acquire execution locks on all relations in a cached plan.
+ *
+ * Returns true if the plan is still valid after locking. Returns
+ * false if the plan was invalidated while locks were being acquired,
+ * in which case the locks have been released and the caller should
+ * discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+bool
+AcquireExecutorLocks(CachedPlan *cplan)
+{
+ AcquireExecutorLocksInt(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocksInt(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..613f3be30b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,10 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If plansource is provided, it is the CachedPlanSource that produced
+ * cplan. PortalLockCachedPlan() uses it to fetch a fresh plan if the
+ * current one is invalidated during execution lock acquisition.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +290,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +304,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->plansource = plansource;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
@@ -517,6 +523,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* drop cached plan reference, if any */
PortalReleaseCachedPlan(portal);
+ portal->plansource = NULL;
/*
* If portal has a snapshot protecting its data, release that. This needs
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..e0fc403e717 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool AcquireExecutorLocks(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..3af535362cd 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,8 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ CachedPlanSource *plansource; /* CachedPlanSource, for replanning on
+ * invalidation */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +242,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v12-0002-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 3-v12-0002-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From 29e5ad113f6974a94fbcf984b43fa3ed86f57632 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v12 2/4] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v12-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (8.8K, 4-v12-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 05c92346e2bec4c8ec9a7cf45ec572c15d64481f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v12 3/4] Introduce ExecutorPrep and refactor executor startup
Move permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper.
ExecutorStart() invokes ExecutorPrep() when QueryDesc->estate is
NULL, keeping current behavior unchanged. If QueryDesc->estate is
already set, ExecutorStart() reuses it.
This is preparatory refactoring only. No caller outside the
executor supplies a prebuilt EState in this commit.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/executor/README | 10 ++-
src/backend/executor/execMain.c | 152 ++++++++++++++++++++++++++------
src/include/executor/execdesc.h | 2 +-
3 files changed, 132 insertions(+), 32 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..890bc3d9333 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart, or implicitly from ExecutorStart
+ if not done earlier. Creates the EState in QueryDesc, performs
+ range table initialization, permission checks, and initial
+ partition pruning.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if QueryDesc.estate is NULL)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b30f768680..2b9397b72f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -76,6 +77,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
+static void ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags);
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
@@ -147,7 +149,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +174,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When no
+ * prep EState was provided, AcquireExecutorLocks() should have locked
+ * every relation in the plan. When one was provided, pruning-aware
+ * locking should have locked at least the unpruned relations. Both
+ * checks are skipped in parallel workers, which acquire relation locks
+ * lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ ExecutorPrep(queryDesc, CurrentResourceOwner, eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking should
+ * have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -274,6 +333,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep
+ *
+ * Build the initial executor state for queryDesc before ExecutorStart().
+ *
+ * This creates the EState and performs the subset of executor startup that
+ * does not require plan-tree initialization, allowing that work to be reused
+ * by callers that need executor state before ExecutorStart():
+ *
+ * - initialize the range table
+ * - perform permission checks
+ * - perform initial partition pruning
+ *
+ * On success, queryDesc->estate is set and can later be reused by
+ * ExecutorStart() instead of rebuilding the same state.
+ *
+ * Caller must ensure that queryDesc->snapshot is active.
+ */
+static void
+ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ PlannedStmt *pstmt;
+
+ Assert(queryDesc != NULL);
+
+ if (queryDesc->operation == CMD_UTILITY)
+ return;
+
+ Assert(ActiveSnapshotSet());
+ Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ Assert(queryDesc->estate == NULL);
+
+ pstmt = queryDesc->plannedstmt;
+
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = queryDesc->params;
+ estate->es_queryEnv = queryDesc->queryEnv;
+ estate->es_top_eflags = eflags;
+
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -849,37 +966,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 37c2576e4bc..aea5ec8ea02 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -45,7 +45,7 @@ typedef struct QueryDesc
int query_instr_options; /* OR of InstrumentOption flags for
* query_instr */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
--
2.47.3
[application/octet-stream] v12-0004-Use-pruning-aware-locking-for-single-statement-c.patch (40.7K, 5-v12-0004-Use-pruning-aware-locking-for-single-statement-c.patch)
download | inline diff:
From c68d5de848572defbb58625d915f3323245294d4 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 20:43:14 +0900
Subject: [PATCH v12 4/4] Use pruning-aware locking for single-statement cached
plans
For single-statement reused generic plans, perform initial partition
pruning before acquiring execution locks, then lock only the
surviving partitions.
Add ExecutorPrepAndLock() which encapsulates the pruning-aware lock
sequence: lock unprunable relations, call ExecutorPrep() to run
initial pruning, then lock survivors. Plan validity is checked
after each step; ExecutorPrepCleanup() handles the case where the
plan is invalidated between prep and execution.
Extend PortalLockCachedPlan() to use the pruning-aware path for
eligible plans (single-statement reused generic, non-utility).
All other cases continue using the conservative lock-all path
from the previous commit.
Track firstResultRels in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving ExecInitModifyTable()
assumptions about the first result relation being available.
Multi-statement CachedPlans (from rule rewriting) always use
conservative locking, since PortalRunMulti() executes statements
sequentially with CCI between them and later statements' pruning
expressions may depend on earlier ones' effects. In principle,
this could be relaxed if the planner can prove that no pruning
expression reads state modified by an earlier statement, but that
is left for a future patch.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning results.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this
was an optimization; it is now a correctness requirement, because
pruned partitions may not be locked.
---
src/backend/commands/explain.c | 45 +++--
src/backend/commands/prepare.c | 30 ++-
src/backend/executor/execMain.c | 142 ++++++++++++++
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/pquery.c | 76 ++++++--
src/backend/utils/cache/plancache.c | 16 ++
src/include/commands/explain.h | 3 +-
src/include/executor/executor.h | 4 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 2 +
src/test/regress/expected/partition_prune.out | 184 ++++++++++++++++++
src/test/regress/expected/plancache.out | 63 ++++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++++
src/test/regress/sql/plancache.sql | 52 +++++
17 files changed, 731 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 112c17b0d64..c5254f0f920 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -377,7 +377,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ NULL);
}
/*
@@ -501,7 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters)
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -532,13 +534,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
/*
* We discard the output if we have no use for it. If we're explaining
* CREATE TABLE AS, we'd better use the appropriate tuple receiver, while
@@ -554,10 +549,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
else
dest = None_Receiver;
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ /*
+ * Create a QueryDesc for the query, or use the one provided by the
+ * caller. When reusing a prep QueryDesc, its snapshot was set at
+ * creation time; we push it as active for ExecutorStart and override the
+ * destination and instrument options, which were not known when the
+ * caller created it.
+ */
+ if (prep_qd)
+ {
+ PushActiveSnapshot(GetActiveSnapshot());
+ queryDesc = prep_qd;
+ Assert(queryDesc->dest == None_Receiver);
+ queryDesc->dest = dest;
+ queryDesc->instrument_options = instrument_option;
+ }
+ else
+ {
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv,
+ instrument_option);
+ }
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 03d7a98fc58..3bbbc052149 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -588,6 +588,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ QueryDesc *prep_qd = NULL;
if (es->memory)
{
@@ -640,8 +641,31 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
pstate->p_queryEnv);
plan_list = cplan->stmt_list;
- if (AcquireExecutorLocks(cplan))
+ if (!CachedPlanCanPrep(cplan, entry->plansource))
+ {
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ continue;
+ }
+
+ prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, plan_list),
+ query_string,
+ GetActiveSnapshot(),
+ InvalidSnapshot,
+ None_Receiver, /* ExplainOnePlan will fix */
+ paramLI,
+ pstate->p_queryEnv,
+ 0 /* ExplainOnePlan will fix */ );
+ if (ExecutorPrepAndLock(prep_qd,
+ CurrentResourceOwner,
+ es->generic ? EXEC_FLAG_EXPLAIN_GENERIC : 0,
+ &cplan->is_valid))
break;
+
+ /* Try again. */
+ ExecutorPrepCleanup(prep_qd);
+ FreeQueryDesc(prep_qd);
ReleaseCachedPlan(cplan, CurrentResourceOwner);
}
@@ -664,6 +688,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(prep_qd == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
@@ -671,7 +696,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ prep_qd);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2b9397b72f3..1e81377cfd8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -333,6 +333,124 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * LockRangeTableRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRangeTableRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not fail
+ * if it's been dropped entirely --- we'll just transiently acquire a
+ * non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksPrepared
+ *
+ * Acquire or release execution locks using pruning results already computed
+ * by ExecutorPrep() and stored in queryDesc->estate.
+ *
+ * This is intended for single-statement reused generic-plan paths that
+ * choose pruning-aware locking instead of the conservative
+ * AcquireExecutorLocks() path.
+ */
+static void
+AcquireExecutorLocksPrepared(QueryDesc *queryDesc, bool acquire)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ Bitmapset *lock_relids;
+ ListCell *lc;
+
+ Assert(queryDesc != NULL);
+ Assert(estate != NULL);
+ Assert(plannedstmt != NULL);
+ Assert(plannedstmt->commandType != CMD_UTILITY);
+
+ lock_relids = bms_difference(estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * Keep the first result relation of each ModifyTable locked even if
+ * pruning removed all target partitions. ExecInitModifyTable() relies on
+ * one such relation remaining available.
+ */
+ foreach(lc, plannedstmt->firstResultRels)
+ {
+ Index rti = lfirst_int(lc);
+
+ lock_relids = bms_add_member(lock_relids, rti);
+ }
+
+ LockRangeTableRelids(plannedstmt->rtable, lock_relids, acquire);
+
+ bms_free(lock_relids);
+
+}
+
+/*
+ * ExecutorPrepAndLock
+ * Perform pruning-aware locking for a single PlannedStmt.
+ *
+ * Locks unprunable relations first, then runs ExecutorPrep() to
+ * determine which partitions survive initial pruning, then locks
+ * only those survivors. Checks *is_valid after each locking step
+ * to detect plan invalidation (e.g., from concurrent DDL or DDL
+ * triggered by a pruning expression).
+ *
+ * Returns true if the plan is still valid and all needed locks are
+ * held. Returns false if the plan was invalidated at any point, in
+ * which case all acquired locks have been released and the caller
+ * should discard the QueryDesc and retry with a fresh plan.
+ */
+bool
+ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid)
+{
+ PlannedStmt *pstmt = queryDesc->plannedstmt;
+
+ /* Lock unprunable rels before pruning can access them. */
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, true);
+ if (!*is_valid)
+ {
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ /* Run pruning and lock survivors. */
+ ExecutorPrep(queryDesc, owner, eflags);
+ AcquireExecutorLocksPrepared(queryDesc, true);
+ if (!*is_valid)
+ {
+ AcquireExecutorLocksPrepared(queryDesc, false);
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ return true;
+}
+
/*
* ExecutorPrep
*
@@ -391,6 +509,30 @@ ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
CurrentResourceOwner = oldowner;
}
+/*
+ * ExecutorPrepCleanup
+ * Clean up an EState that was created by ExecutorPrep() but never
+ * passed to ExecutorStart(). This happens when the plan is
+ * invalidated between prep and execution, and the caller must
+ * discard the prepped state before retrying with a fresh plan.
+ *
+ * Unlike ExecutorEnd(), this does not expect a fully initialized
+ * plan state tree -- only the range table relations and the
+ * EState itself need to be freed.
+ */
+void
+ExecutorPrepCleanup(QueryDesc *queryDesc)
+{
+ EState *estate = queryDesc->estate;
+
+ if (estate == NULL)
+ return;
+
+ ExecCloseRangeTableRelations(estate);
+ FreeExecutorState(estate);
+ queryDesc->estate = NULL;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 478cb01783c..350096bfbe7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -5133,8 +5133,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -5148,6 +5148,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f4689e7c9f8..4cddac7f2fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -675,6 +675,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ff0e875f2a2..6ee51f06920 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of initially
+ * prunable relations. We use bms_next_member() to get the
+ * lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition expansion
+ * preserves RT index order. ExecInitModifyTable() asserts that the
+ * recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4699b53cab7..53c50ab0fce 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,7 +59,9 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
-static bool PortalLockCachedPlan(Portal portal);
+static bool PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **queryDesc_p);
/*
@@ -488,21 +490,6 @@ restart:
* non-default nesting level for the snapshot.
*/
- /*
- * If the portal is backed by a cached plan, acquire execution
- * locks via PortalLockCachedPlan(). If the plan is
- * invalidated during locking, it replans and may change the
- * portal strategy, requiring us to restart PortalStart().
- */
- if (portal->cplan)
- {
- if (PortalLockCachedPlan(portal))
- {
- PopActiveSnapshot();
- goto restart;
- }
- }
-
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -516,6 +503,26 @@ restart:
portal->queryEnv,
0);
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). For eligible plans
+ * (single-statement reused generic), this performs
+ * pruning-aware locking: it runs ExecutorPrep() on the
+ * QueryDesc to determine which partitions survive initial
+ * pruning, then locks only those. If the plan is invalidated
+ * during this process, it replans and rebuilds the QueryDesc.
+ * If replanning changes the portal strategy, we must restart
+ * PortalStart() to redispatch.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal, true, params, &queryDesc))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
@@ -555,7 +562,7 @@ restart:
case PORTAL_ONE_MOD_WITH:
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -611,7 +618,7 @@ restart:
*/
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -1828,15 +1835,32 @@ EnsurePortalSnapshotExists(void)
* Acquire execution locks for a cached-plan-backed portal,
* retrying with a fresh plan if the current one is invalidated.
*
+ * If do_prep is true and the plan is eligible (single-statement reused
+ * generic plan), performs pruning-aware locking via ExecutorPrep() and
+ * populates portal->queryDesc with the prepped QueryDesc. Otherwise
+ * falls back to locking all relations in the plan.
+ *
* Returns true if replanning changed portal->strategy, meaning the
- * caller must redispatch. Returns false once locks are held.
+ * caller must redispatch. Returns false once locks are held and the
+ * plan is valid for execution.
*/
static bool
-PortalLockCachedPlan(Portal portal)
+PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **prep_qd)
{
PortalStrategy start_strategy = portal->strategy;
- if (AcquireExecutorLocks(portal->cplan))
+ if (do_prep && CachedPlanCanPrep(portal->cplan, portal->plansource))
+ {
+ Assert(prep_qd);
+ if (ExecutorPrepAndLock(*prep_qd, portal->resowner, 0,
+ &portal->cplan->is_valid))
+ return false;
+ ExecutorPrepCleanup(*prep_qd);
+ FreeQueryDesc(*prep_qd);
+ }
+ else if (AcquireExecutorLocks(portal->cplan))
return false;
/* Replan. Locks will be taken freshly. */
@@ -1852,5 +1876,15 @@ PortalLockCachedPlan(Portal portal)
if (portal->strategy != start_strategy)
return true;
+ if (prep_qd)
+ {
+ Assert(list_length(portal->stmts) == 1);
+ *prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->sourceText,
+ GetActiveSnapshot(), InvalidSnapshot,
+ None_Receiver, params,
+ portal->queryEnv, 0);
+ }
+
return false;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f7fe366859c..fca2f84081e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1977,6 +1977,22 @@ AcquireExecutorLocks(CachedPlan *cplan)
return true;
}
+/*
+ * CachedPlanCanPrep
+ * Check whether a cached plan is eligible for pruning-aware locking
+ * via ExecutorPrepAndLock().
+ *
+ * Only single-statement reused generic plans with a non-utility command
+ * qualify.
+ */
+bool
+CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource)
+{
+ return (cplan == plansource->gplan &&
+ list_length(cplan->stmt_list) == 1 &&
+ linitial_node(PlannedStmt, cplan->stmt_list)->commandType != CMD_UTILITY);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..3a03355e6b6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -69,7 +69,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters);
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 33bbdbfeffb..093be9bd24b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid);
+extern void ExecutorPrepCleanup(QueryDesc *queryDesc);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27a2c6815b7..a5d00633b4b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 14a1dfed2b9..7f6f7cda781 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -120,6 +120,16 @@ typedef struct PlannedStmt
/* RT indexes of relations targeted by INSERT/UPDATE/DELETE/MERGE */
Bitmapset *resultRelationRelids;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e0fc403e717..2941d3a301b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -254,4 +254,6 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+extern bool CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource);
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 849049f9c51..ec73866486e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4956,3 +4956,187 @@ select * from (select a, b from phv_boolpart) t
(2 rows)
drop table phv_boolpart;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index d58534ca1cd..54077294dce 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -402,3 +402,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 359a9208056..a98844d14f8 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1518,3 +1518,119 @@ select * from (select a, b from phv_boolpart) t
group by grouping sets (a, b);
drop table phv_boolpart;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index aed388d03a1..90b6c5f82bf 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -228,3 +228,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-27 12:03 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-28 08:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-05-28 13:13 ` Thom Brown <[email protected]>
2026-05-29 08:56 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Thom Brown @ 2026-05-28 13:13 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Thu, 28 May 2026 at 09:14, Amit Langote <[email protected]> wrote:
>
> Hi Thom,
>
> On Wed, May 27, 2026 at 9:03 PM Thom Brown <[email protected]> wrote:
> >
> > On Sat, 4 Apr 2026 at 13:11, Amit Langote <[email protected]> wrote:
> > >
> > > Attached is a redesigned version. While working on the previous
> > > design, I grew increasingly uncomfortable with CachedPlanPrepData --
> > > it was smuggling executor state out of GetCachedPlan() through an
> > > out-parameter, which papered over the real problem: GetCachedPlan()
> > > was doing too much. The main change in this version is architectural:
> > > GetCachedPlan() no longer acquires execution locks. Callers now own
> > > that responsibility, which is natural because each call site iterates
> > > stmt_list differently and manages execution state in its own way --
> > > and it lets them choose between conservative lock-all and
> > > pruning-aware locking where appropriate.
> > >
> > > Non-portal call sites remain on the conservative path for now.
> > > _SPI_execute_plan requires care around snapshot setup, which happens
> > > after plan fetch rather than before. SQL functions have a different
> > > issue: init_execution_state() fetches the plan while postquel_start()
> > > handles execution, with execution_state containers in between, making
> > > it harder to thread a prepped QueryDesc through. The portal path and
> > > EXPLAIN EXECUTE cover the most common
> > > prepared-statement-with-partitions workloads; the remaining sites can
> > > be converted incrementally.
> > >
> > > This is now starting to feel closer to what Tom suggested back in
> > > January 2023 [1], where he proposed getting rid of
> > > AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> > > lock acquisition out to callers. He noted that "we'd be pushing the
> > > responsibility for looping back and re-planning out to fairly
> > > high-level calling code" and that "we'd definitely be changing some
> > > fundamental APIs." That is the direction I came around to over the
> > > last couple of weeks while wrestling with CachedPlanPrepData. The
> > > reverted approach also tried to follow Tom's direction but moved
> > > locking into ExecutorStart(), which forced it to handle plan
> > > invalidation from inside the executor by mutating the CachedPlan
> > > in-place. This version moves locking out to the callers instead, so
> > > the executor and plan cache never reach into each other.
> > >
> > > The series is now four patches:
> > >
> > > 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> > > AcquireExecutorLocks() as a caller-facing function with validity check
> > > and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> > > portal retry logic. All callers are converted. No behavioral change.
> > >
> > > 0002: Refactor executor's initial partition pruning setup. Cleanup
> > > only, no behavioral change.
> > >
> > > 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> > > range table init, permission checks, and initial pruning out of
> > > InitPlan(). Scaffolding for 0004; all callers still go through the
> > > normal ExecutorStart() path.
> > >
> > > 0004: Use pruning-aware locking for single-statement cached plans.
> > > Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> > > ExecutorPrep() to determine surviving partitions, then locks only
> > > those. Extends PortalLockCachedPlan() with a pruning-aware path for
> > > eligible plans. Multi-statement CachedPlans (from rule rewriting)
> > > always use conservative locking. In principle, this could be relaxed
> > > if the planner can prove that no pruning expression reads state
> > > modified by an earlier statement, but that is left for a future patch.
> > > Includes regression tests.
> > >
> > > In case it's not clear, I'm not targeting v19 at this point. I'd like
> > > to get this into v20 CF1 and would welcome review from anyone
> > > interested.
> >
> > After not having looked at this in close to 2 years, I thought I'd
> > give it another look.
>
> Thanks for taking a look.
>
> > Not found any user-facing issues, and I'm liking
> > seeing so few locks in pg_locks. I can see that with pruning disabled,
> > the fallback works, pruning-aware locking is working via SPI through
> > plpgsql, running ALTER between executions and also invalidating
> > indexes force replans, and it's looking good.
> >
> > But I also think there might be a bug in patch 0001, but I'd
> > appreciate checking my reasoning because I'm not fully confident I've
> > been diligent enough.
> >
> > When PortalStart() opens a SELECT cursor that's backed by a cached
> > plan, it does roughly the following. It builds a queryDesc (an
> > executor-side struct), one of whose fields is a pointer into the plan
> > tree inside the portal's cached plan. Then it calls
> > PortalLockCachedPlan() to acquire the necessary locks, and finally
> > hands the queryDesc over to the executor.
> >
> > My worry is about what happens if the cached plan turns out to be
> > stale, for instance because someone ran DDL on a referenced table. In
> > that case PortalLockCachedPlan() throws the old plan away (via
> > ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
> > the portal's own pointers to match. But the queryDesc from earlier
> > isn't touched. Its plan pointer still references the old, now-released
> > plan. From what I can see, once that old plan's last reference is
> > dropped its memory can be freed, which would leave the executor
> > reading from freed memory in the next step.
> >
> > The bit I'm least sure about is whether the old plan's memory really
> > does get reclaimed straight away when its refcount hits zero. If
> > something keeps it alive longer then this isn't a bug, or at least not
> > as bad as I'm making out. I had a look but couldn't convince myself
> > either way from the code alone. To actually hit this you'd need a
> > cursor on a cached plan, plus an invalidation arriving in the small
> > window between the portal being set up and the cursor being opened.
> > The race condition is brief, and I've not been able to hit it in
> > testing.
> >
> > The thing that got me thinking this is real: patch 0004 modifies
> > PortalLockCachedPlan() so that whenever it replans, it also rebuilds
> > the queryDesc. That's pretty much the fix I'd expect for this, which
> > makes me suspect somebody hit it at some point. But 0004 only applies
> > that fix on the new pruning-aware code path, and it was mentioned in
> > the thread that 0001 to 0003 might land before 0004. If so, master
> > would carry the bug in the gap between the two.
> >
> > I suspect a way to deal with it would be to move the CreateQueryDesc
> > call in the SELECT case to after PortalLockCachedPlan() returns, which
> > is what the other portal strategies already seem to do. Alternatively,
> > you could bring 0004's changes in this area into 0001 and have
> > PortalLockCachedPlan() always rebuild the queryDesc when it replans.
> >
> > If I've got this wrong and there's some lifetime mechanism I missed
> > that keeps the old plan's memory alive, then it's a non-issue and I'm
> > misreading the code. If I have got it wrong, could you please add
> > comments to make what is going on clearer?
>
> It's a real bug.
>
> You're right that if PortalLockCachedPlan() replans, the QueryDesc
> created before the call still points at the old PlannedStmt from the
> released plan. And yes, 0004 happens to fix it by rebuilding the
> QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> broken on their own.
>
> Attached is an updated set with the fix: CreateQueryDesc now runs
> after PortalLockCachedPlan() returns, as you suggested. That said,
> I'll probably focus first on settling the plancache refactoring that
> spun off from this thread [1], and then start a new thread for the
> pruning-aware locking work on top of it, incorporating parts of this
> series.
Thanks.
I've done another pass. I see a reference to
AcquireExecutorLocksUnpruned(), but I can't find this function. Is
this supposed to be AcquireExecutorLocksPrepared()?
And also I have a question about the new firstResultRels code
If I've followed it right, the bit in setrefs.c records the
lowest-numbered RT index from leaf_result_relids as the
per-ModifyTable fallback that's used when all real targets get pruned
away, and the executor side looks it up via
linitial_int(node->resultRelations). For that to work those two have
to pick the same RT index, and the comment justifies it with
"partition expansion preserves RT index order". Where is that
preservation guaranteed?
And with the assertion in ExecInitModifyTable:
Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
With writable CTEs producing more than one ModifyTable node the list
has several entries, so all the assert really checks is that some
recorded entry matches, not that the one recorded for this particular
node matches. If that's correct, then in a case where the wrong entry
happened to line up the right relation wouldn't be locked and nothing
would complain. Is there something that keeps these in order
somewhere?
Thom
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-27 12:03 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-28 08:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-28 13:13 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
@ 2026-05-29 08:56 ` Amit Langote <[email protected]>
2026-05-29 10:30 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2026-05-29 08:56 UTC (permalink / raw)
To: Thom Brown <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Thu, May 28, 2026 at 10:14 PM Thom Brown <[email protected]> wrote:
> On Thu, 28 May 2026 at 09:14, Amit Langote <[email protected]> wrote:
> > It's a real bug.
> >
> > You're right that if PortalLockCachedPlan() replans, the QueryDesc
> > created before the call still points at the old PlannedStmt from the
> > released plan. And yes, 0004 happens to fix it by rebuilding the
> > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> > broken on their own.
> >
> > Attached is an updated set with the fix: CreateQueryDesc now runs
> > after PortalLockCachedPlan() returns, as you suggested. That said,
> > I'll probably focus first on settling the plancache refactoring that
> > spun off from this thread [1], and then start a new thread for the
> > pruning-aware locking work on top of it, incorporating parts of this
> > series.
>
> Thanks.
>
> I've done another pass. I see a reference to
> AcquireExecutorLocksUnpruned(), but I can't find this function. Is
> this supposed to be AcquireExecutorLocksPrepared()?
You're right, stale comment. It should say
AcquireExecutorLocksPrepared(). Fixed.
> And also I have a question about the new firstResultRels code
>
> If I've followed it right, the bit in setrefs.c records the
> lowest-numbered RT index from leaf_result_relids as the
> per-ModifyTable fallback that's used when all real targets get pruned
> away, and the executor side looks it up via
> linitial_int(node->resultRelations). For that to work those two have
> to pick the same RT index, and the comment justifies it with
> "partition expansion preserves RT index order". Where is that
> preservation guaranteed?
The ordering comes from expand_inherited_rtentry(), which adds child
partitions to the range table sequentially in partition bound order.
Since ModifyTable.resultRelations is built from the same expansion,
its first element is the lowest-numbered RT index among the leaf
partitions for that node. That is the same value
bms_next_member(leaf_result_relids, -1) returns from the Bitmapset,
because Bitmapset iteration returns members in ascending order. I've
added a comment in setrefs.c pointing to expand_inherited_rtentry() as
the source of this guarantee.
> And with the assertion in ExecInitModifyTable:
>
> Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
>
> With writable CTEs producing more than one ModifyTable node the list
> has several entries, so all the assert really checks is that some
> recorded entry matches, not that the one recorded for this particular
> node matches. If that's correct, then in a case where the wrong entry
> happened to line up the right relation wouldn't be locked and nothing
> would complain. Is there something that keeps these in order
> somewhere?
This is a fair observation -- the Assert checks membership in the
global list rather than per-node correspondence. But node A's rti
can't accidentally pass the Assert by matching an entry recorded for
node B. Each ModifyTable node gets its own partition expansion with
distinct RT entries. In a writable CTE like:
WITH upd1 AS (UPDATE t SET ...),
upd2 AS (UPDATE t SET ...)
UPDATE t SET ...
each UPDATE creates a separate set of leaf partition RT entries --
upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE
gets 11,12,13. The global firstResultRels list would be [5, 8, 11].
When ExecInitModifyTable falls back to linitial_int(resultRelations)
for a given node, it finds that node's own entry, because the RT index
sets are disjoint across nodes.
That said, it's worth being explicit about what protections exist at
each layer, since this is safety-critical code:
1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in
firstResultRels unconditionally. So regardless of which rti a
ModifyTable node falls back to, the relation will be locked.
2. ExecGetRangeTableRelation() has two checks when opening a relation.
For non-result relations (isResultRel=false), it checks
es_unpruned_relids and raises an ERROR in release builds if the
relation was pruned. For result relations (isResultRel=true), that
check is intentionally skipped -- it has to be, because at least one
result relation per ModifyTable node must remain openable even when
all partitions are pruned, since executor code paths like ExecMerge()
and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized
(see commit 28317de723b). The remaining protection for result
relations is Assert(CheckRelationLockedByMe()) inside table_open,
which fires in debug builds.
3. I've tightened ExecInitModifyTable to close this gap: the
all-pruned fallback path now raises an elog(ERROR) in release builds
if linitial_int(resultRelations) is not found in firstResultRels,
rather than just an Assert. This gives result relations a
production-visible check comparable to what es_unpruned_relids
provides for scan relations.
So the net effect is that for scan relations, opening a
pruned-and-unlocked relation is caught by an ERROR in production via
es_unpruned_relids. For result relations on the all-pruned fallback
path, it's now also caught by an ERROR in production via the
firstResultRels check in ExecInitModifyTable. The locking in
AcquireExecutorLocksPrepared() ensures the relation is always locked
regardless.
Thanks again for the review. A close look at these aspects by someone
other than me is very useful.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v13-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (8.8K, 2-v13-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 05c92346e2bec4c8ec9a7cf45ec572c15d64481f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v13 3/4] Introduce ExecutorPrep and refactor executor startup
Move permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper.
ExecutorStart() invokes ExecutorPrep() when QueryDesc->estate is
NULL, keeping current behavior unchanged. If QueryDesc->estate is
already set, ExecutorStart() reuses it.
This is preparatory refactoring only. No caller outside the
executor supplies a prebuilt EState in this commit.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/executor/README | 10 ++-
src/backend/executor/execMain.c | 152 ++++++++++++++++++++++++++------
src/include/executor/execdesc.h | 2 +-
3 files changed, 132 insertions(+), 32 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..890bc3d9333 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart, or implicitly from ExecutorStart
+ if not done earlier. Creates the EState in QueryDesc, performs
+ range table initialization, permission checks, and initial
+ partition pruning.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if QueryDesc.estate is NULL)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b30f768680..2b9397b72f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -76,6 +77,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
+static void ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags);
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
@@ -147,7 +149,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +174,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When no
+ * prep EState was provided, AcquireExecutorLocks() should have locked
+ * every relation in the plan. When one was provided, pruning-aware
+ * locking should have locked at least the unpruned relations. Both
+ * checks are skipped in parallel workers, which acquire relation locks
+ * lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ ExecutorPrep(queryDesc, CurrentResourceOwner, eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking should
+ * have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -274,6 +333,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep
+ *
+ * Build the initial executor state for queryDesc before ExecutorStart().
+ *
+ * This creates the EState and performs the subset of executor startup that
+ * does not require plan-tree initialization, allowing that work to be reused
+ * by callers that need executor state before ExecutorStart():
+ *
+ * - initialize the range table
+ * - perform permission checks
+ * - perform initial partition pruning
+ *
+ * On success, queryDesc->estate is set and can later be reused by
+ * ExecutorStart() instead of rebuilding the same state.
+ *
+ * Caller must ensure that queryDesc->snapshot is active.
+ */
+static void
+ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ PlannedStmt *pstmt;
+
+ Assert(queryDesc != NULL);
+
+ if (queryDesc->operation == CMD_UTILITY)
+ return;
+
+ Assert(ActiveSnapshotSet());
+ Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ Assert(queryDesc->estate == NULL);
+
+ pstmt = queryDesc->plannedstmt;
+
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = queryDesc->params;
+ estate->es_queryEnv = queryDesc->queryEnv;
+ estate->es_top_eflags = eflags;
+
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -849,37 +966,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 37c2576e4bc..aea5ec8ea02 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -45,7 +45,7 @@ typedef struct QueryDesc
int query_instr_options; /* OR of InstrumentOption flags for
* query_instr */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
--
2.47.3
[application/octet-stream] v13-0002-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 3-v13-0002-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From 29e5ad113f6974a94fbcf984b43fa3ed86f57632 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v13 2/4] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v13-0001-Move-execution-lock-acquisition-out-of-GetCached.patch (16.2K, 4-v13-0001-Move-execution-lock-acquisition-out-of-GetCached.patch)
download | inline diff:
From a3214580f2ce1983a111af07ccb092ba03c812c8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 18:38:34 +0900
Subject: [PATCH v13 1/4] Move execution lock acquisition out of
GetCachedPlan()
GetCachedPlan() previously acquired execution locks on all plan
relations as part of cached plan validation. Move this
responsibility to callers, making GetCachedPlan() return a valid
plan without holding execution locks.
Add AcquireExecutorLocks() as the caller-facing function: it locks
all relations in the plan, checks that the plan is still valid
afterward, and returns false if it was invalidated so the caller
can retry with a fresh plan.
For portal-backed callers, add PortalLockCachedPlan() in pquery.c
which wraps the lock-check-retry loop and handles the case where
replanning changes the portal strategy. Store the CachedPlanSource
pointer in PortalData so retry can call GetCachedPlan() without
the caller threading it through.
Adjust all non-portal GetCachedPlan() callers (SPI, EXPLAIN
EXECUTE, SQL functions) to call AcquireExecutorLocks() explicitly
after fetching the plan.
No behavioral change. This separates plan retrieval from execution
setup, allowing a later commit to substitute pruning-aware locking
for eligible plans.
---
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 14 +++++-
src/backend/executor/functions.c | 14 ++++--
src/backend/executor/spi.c | 22 +++++++--
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 70 ++++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 44 +++++++++++++-----
src/backend/utils/mmgr/portalmem.c | 7 +++
src/include/utils/plancache.h | 1 +
src/include/utils/portal.h | 3 ++
10 files changed, 157 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..03d7a98fc58 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ entry->plansource,
cplan);
/*
@@ -632,8 +633,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ for (;;)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner,
+ pstate->p_queryEnv);
+ plan_list = cplan->stmt_list;
+
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ }
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..2afb814a435 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -654,6 +654,7 @@ static bool
init_execution_state(SQLFunctionCachePtr fcache)
{
CachedPlanSource *plansource;
+ CachedPlan *cplan;
execution_state *preves = NULL;
execution_state *lasttages = NULL;
int nstmts;
@@ -696,10 +697,15 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
- fcache->cplan = GetCachedPlan(plansource,
- fcache->paramLI,
- fcache->cowner,
- NULL);
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, fcache->paramLI,
+ fcache->cowner, NULL);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, fcache->cowner);
+ }
+ fcache->cplan = cplan;
/*
* If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..268cd10bde8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ plansource,
cplan);
/*
@@ -2106,6 +2107,16 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
_SPI_current->queryEnv);
Assert(cplan == plansource->gplan);
+ if (!AcquireExecutorLocks(cplan))
+ {
+ /* Plan invalidated during locking; get a fresh one. */
+ ReleaseCachedPlan(cplan,
+ plan->saved ? CurrentResourceOwner : NULL);
+ cplan = GetCachedPlan(plansource, NULL,
+ plan->saved ? CurrentResourceOwner : NULL,
+ _SPI_current->queryEnv);
+ }
+
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2574,9 +2585,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
- cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, options->params,
+ plan_owner, _SPI_current->queryEnv);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, plan_owner);
+ }
stmt_list = cplan->stmt_list;
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dbef734a93f..2929f158338 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1243,6 +1243,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2042,6 +2043,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ psrc,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index ee731000820..4699b53cab7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,6 +59,7 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
+static bool PortalLockCachedPlan(Portal portal);
/*
@@ -463,6 +464,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
portal->strategy = ChoosePortalStrategy(portal->stmts);
+restart:
+
/*
* Fire her up according to the strategy
*/
@@ -485,6 +488,21 @@ PortalStart(Portal portal, ParamListInfo params,
* non-default nesting level for the snapshot.
*/
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). If the plan is
+ * invalidated during locking, it replans and may change the
+ * portal strategy, requiring us to restart PortalStart().
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -535,6 +553,11 @@ PortalStart(Portal portal, ParamListInfo params,
case PORTAL_ONE_RETURNING:
case PORTAL_ONE_MOD_WITH:
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
/*
* We don't start the executor until we are told to run the
@@ -578,7 +601,20 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+
+ /*
+ * GetCachedPlan() no longer acquires execution locks, so we
+ * must do it here. Multi-statement plans always use
+ * conservative locking (all partitions locked); pruning-aware
+ * locking is not feasible because PortalRunMulti() executes
+ * statements sequentially with CCI between them.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -1786,3 +1822,35 @@ EnsurePortalSnapshotExists(void)
/* PushActiveSnapshotWithLevel might have copied the snapshot */
portal->portalSnapshot = GetActiveSnapshot();
}
+
+/*
+ * PortalLockCachedPlan
+ * Acquire execution locks for a cached-plan-backed portal,
+ * retrying with a fresh plan if the current one is invalidated.
+ *
+ * Returns true if replanning changed portal->strategy, meaning the
+ * caller must redispatch. Returns false once locks are held.
+ */
+static bool
+PortalLockCachedPlan(Portal portal)
+{
+ PortalStrategy start_strategy = portal->strategy;
+
+ if (AcquireExecutorLocks(portal->cplan))
+ return false;
+
+ /* Replan. Locks will be taken freshly. */
+ ReleaseCachedPlan(portal->cplan, portal->resowner);
+ portal->cplan = NULL;
+ portal->stmts = NIL;
+ portal->cplan = GetCachedPlan(portal->plansource,
+ portal->portalParams,
+ portal->resowner,
+ portal->queryEnv);
+ portal->stmts = portal->cplan->stmt_list;
+ portal->strategy = ChoosePortalStrategy(portal->stmts);
+ if (portal->strategy != start_strategy)
+ return true;
+
+ return false;
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..f7fe366859c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,7 +100,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksInt(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +945,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan. Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +984,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
-
/*
* If plan was transient, check to see if TransactionXmin has
* advanced, and if so invalidate it.
@@ -1003,9 +1002,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
/* Successfully revalidated and locked the query. */
return true;
}
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1282,8 +1278,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but no execution locks are held.
+ * The caller must call AcquireExecutorLocks() before executing.
+ * For freshly built plans (custom or new generic), the planner
+ * already holds the needed locks, so AcquireExecutorLocks() is
+ * redundant but harmless.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1906,9 +1905,11 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksInt(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1956,27 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * AcquireExecutorLocks
+ * Acquire execution locks on all relations in a cached plan.
+ *
+ * Returns true if the plan is still valid after locking. Returns
+ * false if the plan was invalidated while locks were being acquired,
+ * in which case the locks have been released and the caller should
+ * discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+bool
+AcquireExecutorLocks(CachedPlan *cplan)
+{
+ AcquireExecutorLocksInt(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocksInt(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..613f3be30b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,10 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If plansource is provided, it is the CachedPlanSource that produced
+ * cplan. PortalLockCachedPlan() uses it to fetch a fresh plan if the
+ * current one is invalidated during execution lock acquisition.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +290,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +304,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->plansource = plansource;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
@@ -517,6 +523,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* drop cached plan reference, if any */
PortalReleaseCachedPlan(portal);
+ portal->plansource = NULL;
/*
* If portal has a snapshot protecting its data, release that. This needs
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..e0fc403e717 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool AcquireExecutorLocks(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..3af535362cd 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,8 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ CachedPlanSource *plansource; /* CachedPlanSource, for replanning on
+ * invalidation */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +242,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v13-0004-Use-pruning-aware-locking-for-single-statement-c.patch (40.8K, 5-v13-0004-Use-pruning-aware-locking-for-single-statement-c.patch)
download | inline diff:
From 5785e0903b867f024e4b675783dfd76dc00ee733 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 20:43:14 +0900
Subject: [PATCH v13 4/4] Use pruning-aware locking for single-statement cached
plans
For single-statement reused generic plans, perform initial partition
pruning before acquiring execution locks, then lock only the
surviving partitions.
Add ExecutorPrepAndLock() which encapsulates the pruning-aware lock
sequence: lock unprunable relations, call ExecutorPrep() to run
initial pruning, then lock survivors. Plan validity is checked
after each step; ExecutorPrepCleanup() handles the case where the
plan is invalidated between prep and execution.
Extend PortalLockCachedPlan() to use the pruning-aware path for
eligible plans (single-statement reused generic, non-utility).
All other cases continue using the conservative lock-all path
from the previous commit.
Track firstResultRels in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving ExecInitModifyTable()
assumptions about the first result relation being available.
Multi-statement CachedPlans (from rule rewriting) always use
conservative locking, since PortalRunMulti() executes statements
sequentially with CCI between them and later statements' pruning
expressions may depend on earlier ones' effects. In principle,
this could be relaxed if the planner can prove that no pruning
expression reads state modified by an earlier statement, but that
is left for a future patch.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning results.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this
was an optimization; it is now a correctness requirement, because
pruned partitions may not be locked.
---
src/backend/commands/explain.c | 45 +++--
src/backend/commands/prepare.c | 30 ++-
src/backend/executor/execMain.c | 142 ++++++++++++++
src/backend/executor/nodeModifyTable.c | 7 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 19 ++
src/backend/tcop/pquery.c | 76 ++++++--
src/backend/utils/cache/plancache.c | 16 ++
src/include/commands/explain.h | 3 +-
src/include/executor/executor.h | 4 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 2 +
src/test/regress/expected/partition_prune.out | 184 ++++++++++++++++++
src/test/regress/expected/plancache.out | 63 ++++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++++
src/test/regress/sql/plancache.sql | 52 +++++
17 files changed, 734 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 112c17b0d64..c5254f0f920 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -377,7 +377,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ NULL);
}
/*
@@ -501,7 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters)
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -532,13 +534,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
/*
* We discard the output if we have no use for it. If we're explaining
* CREATE TABLE AS, we'd better use the appropriate tuple receiver, while
@@ -554,10 +549,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
else
dest = None_Receiver;
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ /*
+ * Create a QueryDesc for the query, or use the one provided by the
+ * caller. When reusing a prep QueryDesc, its snapshot was set at
+ * creation time; we push it as active for ExecutorStart and override the
+ * destination and instrument options, which were not known when the
+ * caller created it.
+ */
+ if (prep_qd)
+ {
+ PushActiveSnapshot(GetActiveSnapshot());
+ queryDesc = prep_qd;
+ Assert(queryDesc->dest == None_Receiver);
+ queryDesc->dest = dest;
+ queryDesc->instrument_options = instrument_option;
+ }
+ else
+ {
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv,
+ instrument_option);
+ }
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 03d7a98fc58..3bbbc052149 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -588,6 +588,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ QueryDesc *prep_qd = NULL;
if (es->memory)
{
@@ -640,8 +641,31 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
pstate->p_queryEnv);
plan_list = cplan->stmt_list;
- if (AcquireExecutorLocks(cplan))
+ if (!CachedPlanCanPrep(cplan, entry->plansource))
+ {
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ continue;
+ }
+
+ prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, plan_list),
+ query_string,
+ GetActiveSnapshot(),
+ InvalidSnapshot,
+ None_Receiver, /* ExplainOnePlan will fix */
+ paramLI,
+ pstate->p_queryEnv,
+ 0 /* ExplainOnePlan will fix */ );
+ if (ExecutorPrepAndLock(prep_qd,
+ CurrentResourceOwner,
+ es->generic ? EXEC_FLAG_EXPLAIN_GENERIC : 0,
+ &cplan->is_valid))
break;
+
+ /* Try again. */
+ ExecutorPrepCleanup(prep_qd);
+ FreeQueryDesc(prep_qd);
ReleaseCachedPlan(cplan, CurrentResourceOwner);
}
@@ -664,6 +688,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(prep_qd == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
@@ -671,7 +696,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ prep_qd);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2b9397b72f3..bbfa0e2b92a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -333,6 +333,124 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * LockRangeTableRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksPrepared().
+ */
+static void
+LockRangeTableRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not fail
+ * if it's been dropped entirely --- we'll just transiently acquire a
+ * non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksPrepared
+ *
+ * Acquire or release execution locks using pruning results already computed
+ * by ExecutorPrep() and stored in queryDesc->estate.
+ *
+ * This is intended for single-statement reused generic-plan paths that
+ * choose pruning-aware locking instead of the conservative
+ * AcquireExecutorLocks() path.
+ */
+static void
+AcquireExecutorLocksPrepared(QueryDesc *queryDesc, bool acquire)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ Bitmapset *lock_relids;
+ ListCell *lc;
+
+ Assert(queryDesc != NULL);
+ Assert(estate != NULL);
+ Assert(plannedstmt != NULL);
+ Assert(plannedstmt->commandType != CMD_UTILITY);
+
+ lock_relids = bms_difference(estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * Keep the first result relation of each ModifyTable locked even if
+ * pruning removed all target partitions. ExecInitModifyTable() relies on
+ * one such relation remaining available.
+ */
+ foreach(lc, plannedstmt->firstResultRels)
+ {
+ Index rti = lfirst_int(lc);
+
+ lock_relids = bms_add_member(lock_relids, rti);
+ }
+
+ LockRangeTableRelids(plannedstmt->rtable, lock_relids, acquire);
+
+ bms_free(lock_relids);
+
+}
+
+/*
+ * ExecutorPrepAndLock
+ * Perform pruning-aware locking for a single PlannedStmt.
+ *
+ * Locks unprunable relations first, then runs ExecutorPrep() to
+ * determine which partitions survive initial pruning, then locks
+ * only those survivors. Checks *is_valid after each locking step
+ * to detect plan invalidation (e.g., from concurrent DDL or DDL
+ * triggered by a pruning expression).
+ *
+ * Returns true if the plan is still valid and all needed locks are
+ * held. Returns false if the plan was invalidated at any point, in
+ * which case all acquired locks have been released and the caller
+ * should discard the QueryDesc and retry with a fresh plan.
+ */
+bool
+ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid)
+{
+ PlannedStmt *pstmt = queryDesc->plannedstmt;
+
+ /* Lock unprunable rels before pruning can access them. */
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, true);
+ if (!*is_valid)
+ {
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ /* Run pruning and lock survivors. */
+ ExecutorPrep(queryDesc, owner, eflags);
+ AcquireExecutorLocksPrepared(queryDesc, true);
+ if (!*is_valid)
+ {
+ AcquireExecutorLocksPrepared(queryDesc, false);
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ return true;
+}
+
/*
* ExecutorPrep
*
@@ -391,6 +509,30 @@ ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
CurrentResourceOwner = oldowner;
}
+/*
+ * ExecutorPrepCleanup
+ * Clean up an EState that was created by ExecutorPrep() but never
+ * passed to ExecutorStart(). This happens when the plan is
+ * invalidated between prep and execution, and the caller must
+ * discard the prepped state before retrying with a fresh plan.
+ *
+ * Unlike ExecutorEnd(), this does not expect a fully initialized
+ * plan state tree -- only the range table relations and the
+ * EState itself need to be freed.
+ */
+void
+ExecutorPrepCleanup(QueryDesc *queryDesc)
+{
+ EState *estate = queryDesc->estate;
+
+ if (estate == NULL)
+ return;
+
+ ExecCloseRangeTableRelations(estate);
+ FreeExecutorState(estate);
+ queryDesc->estate = NULL;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 478cb01783c..6e78b61f700 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -5133,8 +5133,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksPrepared(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -5148,6 +5148,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ if (!list_member_int(estate->es_plannedstmt->firstResultRels, rti))
+ elog(ERROR, "first result relation %u not found in firstResultRels",
+ rti);
i = 0;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f4689e7c9f8..4cddac7f2fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -675,6 +675,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ff0e875f2a2..4495bc6e627 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,25 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of initially
+ * prunable relations. We use bms_next_member() to get the
+ * lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because
+ * expand_inherited_rtentry() adds child partitions to the range table
+ * sequentially in partition bound order, and resultRelations is built
+ * from that same expansion.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4699b53cab7..53c50ab0fce 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,7 +59,9 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
-static bool PortalLockCachedPlan(Portal portal);
+static bool PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **queryDesc_p);
/*
@@ -488,21 +490,6 @@ restart:
* non-default nesting level for the snapshot.
*/
- /*
- * If the portal is backed by a cached plan, acquire execution
- * locks via PortalLockCachedPlan(). If the plan is
- * invalidated during locking, it replans and may change the
- * portal strategy, requiring us to restart PortalStart().
- */
- if (portal->cplan)
- {
- if (PortalLockCachedPlan(portal))
- {
- PopActiveSnapshot();
- goto restart;
- }
- }
-
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -516,6 +503,26 @@ restart:
portal->queryEnv,
0);
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). For eligible plans
+ * (single-statement reused generic), this performs
+ * pruning-aware locking: it runs ExecutorPrep() on the
+ * QueryDesc to determine which partitions survive initial
+ * pruning, then locks only those. If the plan is invalidated
+ * during this process, it replans and rebuilds the QueryDesc.
+ * If replanning changes the portal strategy, we must restart
+ * PortalStart() to redispatch.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal, true, params, &queryDesc))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
@@ -555,7 +562,7 @@ restart:
case PORTAL_ONE_MOD_WITH:
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -611,7 +618,7 @@ restart:
*/
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -1828,15 +1835,32 @@ EnsurePortalSnapshotExists(void)
* Acquire execution locks for a cached-plan-backed portal,
* retrying with a fresh plan if the current one is invalidated.
*
+ * If do_prep is true and the plan is eligible (single-statement reused
+ * generic plan), performs pruning-aware locking via ExecutorPrep() and
+ * populates portal->queryDesc with the prepped QueryDesc. Otherwise
+ * falls back to locking all relations in the plan.
+ *
* Returns true if replanning changed portal->strategy, meaning the
- * caller must redispatch. Returns false once locks are held.
+ * caller must redispatch. Returns false once locks are held and the
+ * plan is valid for execution.
*/
static bool
-PortalLockCachedPlan(Portal portal)
+PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **prep_qd)
{
PortalStrategy start_strategy = portal->strategy;
- if (AcquireExecutorLocks(portal->cplan))
+ if (do_prep && CachedPlanCanPrep(portal->cplan, portal->plansource))
+ {
+ Assert(prep_qd);
+ if (ExecutorPrepAndLock(*prep_qd, portal->resowner, 0,
+ &portal->cplan->is_valid))
+ return false;
+ ExecutorPrepCleanup(*prep_qd);
+ FreeQueryDesc(*prep_qd);
+ }
+ else if (AcquireExecutorLocks(portal->cplan))
return false;
/* Replan. Locks will be taken freshly. */
@@ -1852,5 +1876,15 @@ PortalLockCachedPlan(Portal portal)
if (portal->strategy != start_strategy)
return true;
+ if (prep_qd)
+ {
+ Assert(list_length(portal->stmts) == 1);
+ *prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->sourceText,
+ GetActiveSnapshot(), InvalidSnapshot,
+ None_Receiver, params,
+ portal->queryEnv, 0);
+ }
+
return false;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f7fe366859c..fca2f84081e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1977,6 +1977,22 @@ AcquireExecutorLocks(CachedPlan *cplan)
return true;
}
+/*
+ * CachedPlanCanPrep
+ * Check whether a cached plan is eligible for pruning-aware locking
+ * via ExecutorPrepAndLock().
+ *
+ * Only single-statement reused generic plans with a non-utility command
+ * qualify.
+ */
+bool
+CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource)
+{
+ return (cplan == plansource->gplan &&
+ list_length(cplan->stmt_list) == 1 &&
+ linitial_node(PlannedStmt, cplan->stmt_list)->commandType != CMD_UTILITY);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..3a03355e6b6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -69,7 +69,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters);
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 33bbdbfeffb..093be9bd24b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid);
+extern void ExecutorPrepCleanup(QueryDesc *queryDesc);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27a2c6815b7..a5d00633b4b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 14a1dfed2b9..1a328ea138c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -120,6 +120,16 @@ typedef struct PlannedStmt
/* RT indexes of relations targeted by INSERT/UPDATE/DELETE/MERGE */
Bitmapset *resultRelationRelids;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksPrepared() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e0fc403e717..2941d3a301b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -254,4 +254,6 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+extern bool CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource);
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 849049f9c51..ec73866486e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4956,3 +4956,187 @@ select * from (select a, b from phv_boolpart) t
(2 rows)
drop table phv_boolpart;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index d58534ca1cd..54077294dce 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -402,3 +402,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 359a9208056..a98844d14f8 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1518,3 +1518,119 @@ select * from (select a, b from phv_boolpart) t
group by grouping sets (a, b);
drop table phv_boolpart;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index aed388d03a1..90b6c5f82bf 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -228,3 +228,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-27 12:03 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-28 08:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-28 13:13 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-29 08:56 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2026-05-29 10:30 ` Thom Brown <[email protected]>
2026-06-02 17:54 ` Re: generic plans and "initial" pruning Ilmar Yunusov <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Thom Brown @ 2026-05-29 10:30 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Fri, 29 May 2026 at 09:57, Amit Langote <[email protected]> wrote:
>
> On Thu, May 28, 2026 at 10:14 PM Thom Brown <[email protected]> wrote:
> > On Thu, 28 May 2026 at 09:14, Amit Langote <[email protected]> wrote:
> > > It's a real bug.
> > >
> > > You're right that if PortalLockCachedPlan() replans, the QueryDesc
> > > created before the call still points at the old PlannedStmt from the
> > > released plan. And yes, 0004 happens to fix it by rebuilding the
> > > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> > > broken on their own.
> > >
> > > Attached is an updated set with the fix: CreateQueryDesc now runs
> > > after PortalLockCachedPlan() returns, as you suggested. That said,
> > > I'll probably focus first on settling the plancache refactoring that
> > > spun off from this thread [1], and then start a new thread for the
> > > pruning-aware locking work on top of it, incorporating parts of this
> > > series.
> >
> > Thanks.
> >
> > I've done another pass. I see a reference to
> > AcquireExecutorLocksUnpruned(), but I can't find this function. Is
> > this supposed to be AcquireExecutorLocksPrepared()?
>
> You're right, stale comment. It should say
> AcquireExecutorLocksPrepared(). Fixed.
>
> > And also I have a question about the new firstResultRels code
> >
> > If I've followed it right, the bit in setrefs.c records the
> > lowest-numbered RT index from leaf_result_relids as the
> > per-ModifyTable fallback that's used when all real targets get pruned
> > away, and the executor side looks it up via
> > linitial_int(node->resultRelations). For that to work those two have
> > to pick the same RT index, and the comment justifies it with
> > "partition expansion preserves RT index order". Where is that
> > preservation guaranteed?
>
> The ordering comes from expand_inherited_rtentry(), which adds child
> partitions to the range table sequentially in partition bound order.
> Since ModifyTable.resultRelations is built from the same expansion,
> its first element is the lowest-numbered RT index among the leaf
> partitions for that node. That is the same value
> bms_next_member(leaf_result_relids, -1) returns from the Bitmapset,
> because Bitmapset iteration returns members in ascending order. I've
> added a comment in setrefs.c pointing to expand_inherited_rtentry() as
> the source of this guarantee.
>
> > And with the assertion in ExecInitModifyTable:
> >
> > Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
> >
> > With writable CTEs producing more than one ModifyTable node the list
> > has several entries, so all the assert really checks is that some
> > recorded entry matches, not that the one recorded for this particular
> > node matches. If that's correct, then in a case where the wrong entry
> > happened to line up the right relation wouldn't be locked and nothing
> > would complain. Is there something that keeps these in order
> > somewhere?
>
> This is a fair observation -- the Assert checks membership in the
> global list rather than per-node correspondence. But node A's rti
> can't accidentally pass the Assert by matching an entry recorded for
> node B. Each ModifyTable node gets its own partition expansion with
> distinct RT entries. In a writable CTE like:
>
> WITH upd1 AS (UPDATE t SET ...),
> upd2 AS (UPDATE t SET ...)
> UPDATE t SET ...
>
> each UPDATE creates a separate set of leaf partition RT entries --
> upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE
> gets 11,12,13. The global firstResultRels list would be [5, 8, 11].
> When ExecInitModifyTable falls back to linitial_int(resultRelations)
> for a given node, it finds that node's own entry, because the RT index
> sets are disjoint across nodes.
>
> That said, it's worth being explicit about what protections exist at
> each layer, since this is safety-critical code:
>
> 1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in
> firstResultRels unconditionally. So regardless of which rti a
> ModifyTable node falls back to, the relation will be locked.
>
> 2. ExecGetRangeTableRelation() has two checks when opening a relation.
> For non-result relations (isResultRel=false), it checks
> es_unpruned_relids and raises an ERROR in release builds if the
> relation was pruned. For result relations (isResultRel=true), that
> check is intentionally skipped -- it has to be, because at least one
> result relation per ModifyTable node must remain openable even when
> all partitions are pruned, since executor code paths like ExecMerge()
> and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized
> (see commit 28317de723b). The remaining protection for result
> relations is Assert(CheckRelationLockedByMe()) inside table_open,
> which fires in debug builds.
>
> 3. I've tightened ExecInitModifyTable to close this gap: the
> all-pruned fallback path now raises an elog(ERROR) in release builds
> if linitial_int(resultRelations) is not found in firstResultRels,
> rather than just an Assert. This gives result relations a
> production-visible check comparable to what es_unpruned_relids
> provides for scan relations.
>
> So the net effect is that for scan relations, opening a
> pruned-and-unlocked relation is caught by an ERROR in production via
> es_unpruned_relids. For result relations on the all-pruned fallback
> path, it's now also caught by an ERROR in production via the
> firstResultRels check in ExecInitModifyTable. The locking in
> AcquireExecutorLocksPrepared() ensures the relation is always locked
> regardless.
>
> Thanks again for the review. A close look at these aspects by someone
> other than me is very useful.
Ah, the disjoint RT-entries point is what I was missing. I'd been
reading firstResultRels as a flat list where in theory any entry could
line up with any node's lookup, which is what made the assert feel
potentially insufficient. If each ModifyTable's expansion produces its
own non-overlapping set of leaf RT indexes then membership in the
global list really is equivalent to membership in this node's own
entry, and the assert is sufficient as it stands. Walking through the
writable-CTE case helped.
Thanks
Thom
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-27 12:03 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-28 08:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-28 13:13 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-29 08:56 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-29 10:30 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
@ 2026-06-02 17:54 ` Ilmar Yunusov <[email protected]>
2026-06-04 00:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Ilmar Yunusov @ 2026-06-02 17:54 UTC (permalink / raw)
To: [email protected]; +Cc: Amit Langote <[email protected]>
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, failed
Spec compliant: not tested
Documentation: not tested
Hi,
I looked at v13, focusing on apply/build status and relation-lock behavior for
reused generic plans after initial partition pruning.
I used the v13 series from Amit's 2026-05-29 message, on origin/master at
4b0bf0788b066a4ca1d4f959566678e44ec93422.
The series applies cleanly with git am, and git diff --check reports no
issues.
I first built with:
./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
make -s -j8
make -s install
make -C src/test/regress check
passed; all 245 tests passed, including plancache and partition_prune.
I also built a cassert/debug tree with:
./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-cassert --enable-debug 'CFLAGS=-O0 -g'
make -s -j8
make -s install
and ran:
make -C src/test/regress check
which also passed; all 245 tests passed.
For the lock behavior, I used a list-partitioned table with force_generic_plan.
After the generic plan had been built and then reused, EXECUTE held only the
matching child partition lock. For example, EXECUTE q(1) held only the
following child lock:
manual_prunelock_p1
EXPLAIN EXECUTE behaved the same way on a reused generic plan; EXPLAIN EXECUTE
q(2) removed the other subplans and held only the following child lock:
manual_prunelock_p2
With enable_partition_pruning = off and a newly prepared statement, executing
the same SELECT held all child partition locks:
manual_prunelock_p1, manual_prunelock_p2, manual_prunelock_p3
I also ran a bounded cassert/debug stress check around plan invalidation. It
did 20 cycles where a child index was created and dropped before EXECUTE, and
20 similar cycles before EXPLAIN EXECUTE. In each cycle, the first execution
after invalidation/replanning held all child partition locks, and the next
execution reusing the generic plan held only the matching child partition lock.
That matches my reading that the patch is reducing locks for reused generic
plans, not for the execution that has to rebuild the plan.
One behavior I wanted to confirm: prepared UPDATE execution still held all
child partition locks in my manual check, including on the second execution
where the generic plan was being reused.
The test was:
prepare upd(int, text) as
update stress_prunelock_p set b = $2 where a = $1;
Then both:
execute upd(3, 'updated-row-3');
and an all-pruned value:
execute upd(99, 'no-row');
held:
stress_prunelock_p1, stress_prunelock_p2, stress_prunelock_p3,
stress_prunelock_p4
pg_prepared_statements showed generic_plans increasing for this prepared
statement, so this was not a custom-plan case.
Is this expected for ModifyTable/result relations in v13, or did I miss an
eligibility condition that prevents pruning-aware locking from being used for
this prepared UPDATE case? I saw the recent firstResultRels discussion, but I
was not sure whether those changes are intended only to make pruned
result-relation initialization safe, or whether actual prepared DML execution
is expected to see reduced child partition locking as well.
I did not review the broader plancache refactoring design, did not run
installcheck-world, and did not test concurrent DDL from a separate session.
Regards,
Ilmar Yunusov
The new status of this patch is: Waiting on Author
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-21 10:22 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-22 08:12 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-06-20 12:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-17 12:11 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-07-22 06:43 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-12 14:17 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-17 12:50 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-20 07:30 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-11-24 03:29 ` Re: generic plans and "initial" pruning Chao Li <[email protected]>
2025-11-25 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-02-11 04:05 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-07 09:54 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-09 04:41 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-19 17:20 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-25 07:39 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-26 09:24 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-03-27 09:00 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-04-04 12:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-27 12:03 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-28 08:13 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-28 13:13 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-05-29 08:56 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2026-05-29 10:30 ` Re: generic plans and "initial" pruning Thom Brown <[email protected]>
2026-06-02 17:54 ` Re: generic plans and "initial" pruning Ilmar Yunusov <[email protected]>
@ 2026-06-04 00:25 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2026-06-04 00:25 UTC (permalink / raw)
To: Ilmar Yunusov <[email protected]>; +Cc: [email protected]
Hi Ilmar,
On Wed, Jun 3, 2026 at 2:55 AM Ilmar Yunusov <[email protected]> wrote:
>
> I looked at v13, focusing on apply/build status and relation-lock behavior for
> reused generic plans after initial partition pruning.
>
> I used the v13 series from Amit's 2026-05-29 message, on origin/master at
> 4b0bf0788b066a4ca1d4f959566678e44ec93422.
>
> The series applies cleanly with git am, and git diff --check reports no
> issues.
>
> I first built with:
>
> ./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
> make -s -j8
> make -s install
>
> make -C src/test/regress check
>
> passed; all 245 tests passed, including plancache and partition_prune.
>
> I also built a cassert/debug tree with:
>
> ./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-cassert --enable-debug 'CFLAGS=-O0 -g'
> make -s -j8
> make -s install
>
> and ran:
>
> make -C src/test/regress check
>
> which also passed; all 245 tests passed.
>
> For the lock behavior, I used a list-partitioned table with force_generic_plan.
> After the generic plan had been built and then reused, EXECUTE held only the
> matching child partition lock. For example, EXECUTE q(1) held only the
> following child lock:
>
> manual_prunelock_p1
>
> EXPLAIN EXECUTE behaved the same way on a reused generic plan; EXPLAIN EXECUTE
> q(2) removed the other subplans and held only the following child lock:
>
> manual_prunelock_p2
>
> With enable_partition_pruning = off and a newly prepared statement, executing
> the same SELECT held all child partition locks:
>
> manual_prunelock_p1, manual_prunelock_p2, manual_prunelock_p3
>
> I also ran a bounded cassert/debug stress check around plan invalidation. It
> did 20 cycles where a child index was created and dropped before EXECUTE, and
> 20 similar cycles before EXPLAIN EXECUTE. In each cycle, the first execution
> after invalidation/replanning held all child partition locks, and the next
> execution reusing the generic plan held only the matching child partition lock.
> That matches my reading that the patch is reducing locks for reused generic
> plans, not for the execution that has to rebuild the plan.
Thanks for thorough testing.
> One behavior I wanted to confirm: prepared UPDATE execution still held all
> child partition locks in my manual check, including on the second execution
> where the generic plan was being reused.
>
> The test was:
>
> prepare upd(int, text) as
> update stress_prunelock_p set b = $2 where a = $1;
>
> Then both:
>
> execute upd(3, 'updated-row-3');
>
> and an all-pruned value:
>
> execute upd(99, 'no-row');
>
> held:
>
> stress_prunelock_p1, stress_prunelock_p2, stress_prunelock_p3,
> stress_prunelock_p4
>
> pg_prepared_statements showed generic_plans increasing for this prepared
> statement, so this was not a custom-plan case.
>
> Is this expected for ModifyTable/result relations in v13, or did I miss an
> eligibility condition that prevents pruning-aware locking from being used for
> this prepared UPDATE case? I saw the recent firstResultRels discussion, but I
> was not sure whether those changes are intended only to make pruned
> result-relation initialization safe, or whether actual prepared DML execution
> is expected to see reduced child partition locking as well.
Yes, this is expected; the pruning-aware path currently only kicks in
for the portal strategy used by SELECT. I hadn't noticed that
UPDATE/DELETE ends up on a different strategy that bypasses the new
pruning-aware locking path. I need to think about how best to handle
this; the DML portal strategies defer executor startup to a later
point, so it may require some restructuring.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-12 11:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-21 03:40 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-22 15:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-23 08:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-23 12:46 ` Re: generic plans and "initial" pruning Tender Wang <[email protected]>
2025-02-25 02:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 03:06 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 13:25 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-05-20 15:38 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
@ 2025-05-22 13:50 ` Robert Haas <[email protected]>
1 sibling, 0 replies; 66+ messages in thread
From: Robert Haas @ 2025-05-22 13:50 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Amit Langote <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Tue, May 20, 2025 at 11:38 AM Tom Lane <[email protected]> wrote:
> I still like the core idea of deferring locking, but I don't like
> anything about this implementation of it. It seems like there has
> to be a better and simpler way.
Without particularly defending this implementation, and certainly
without defending its bugs, I just want to say that I'm not convinced
by the idea that there has to be a better and simpler way. We --
principally Amit, but also me and you and others -- have been trying
to find the best way of doing this for probably 5 years now. If you do
something during executor startup, you have to be prepared for
executor startup to force a replan, and if you do something before
executor startup, then you're duplicating executor logic into a new
phase that needs to communicate its results forward to execution
proper. Either approach is awkward and that awkwardness seems to
inevitably bleed into the plan cache specifically. I'd be beyond
delighted if you want to help chart a path through the awkwardness
here, since you know this stuff better than anybody, but I am
skeptical that there is a truly marvelous approach which we've just
managed to overlook for all this time.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-14 21:00 ` Alexander Lakhin <[email protected]>
2025-02-15 07:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Alexander Lakhin @ 2025-02-14 21:00 UTC (permalink / raw)
To: Amit Langote <[email protected]>; Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
Hello Amit,
06.02.2025 04:35, Amit Langote wrote:
> I plan to push 0001 tomorrow, barring any objections.
>
Please try the following script:
CREATE TABLE pt (a int, b int) PARTITION BY range (a);
CREATE TABLE tp1 PARTITION OF pt FOR VALUES FROM (1) TO (2);
CREATE TABLE tp2 PARTITION OF pt FOR VALUES FROM (2) TO (3);
MERGE INTO pt
USING (SELECT pg_backend_pid() AS pid) AS q JOIN tp1 ON (q.pid = tp1.a)
ON pt.a = tp1.a
WHEN MATCHED THEN DELETE;
which fails for me with segfault:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at nodeModifyTable.c:3680
3680 relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
(gdb) bt
#0 ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at nodeModifyTable.c:3680
#1 0x00005a9b67e6dfb5 in ExecInitModifyTable (node=0x5a9b9fbd5858, estate=0x5a9b9fbcbe20, eflags=0) at
nodeModifyTable.c:4906
#2 0x00005a9b67e273f7 in ExecInitNode (node=0x5a9b9fbd5858, estate=0x5a9b9fbcbe20, eflags=0) at execProcnode.c:177
#3 0x00005a9b67e1b9d2 in InitPlan (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:1092
#4 0x00005a9b67e1a524 in standard_ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:268
#5 0x00005a9b67e1a223 in ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:142
...
starting from cbc127917.
(I've discovered this anomaly with SQLsmith.)
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-14 21:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
@ 2025-02-15 07:51 ` Amit Langote <[email protected]>
2025-02-16 04:37 ` Re: generic plans and "initial" pruning Junwang Zhao <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Amit Langote @ 2025-02-15 07:51 UTC (permalink / raw)
To: Alexander Lakhin <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
Hi Alexander,
On Sat, Feb 15, 2025 at 6:00 AM Alexander Lakhin <[email protected]> wrote:
>
> Hello Amit,
>
> 06.02.2025 04:35, Amit Langote wrote:
>
> I plan to push 0001 tomorrow, barring any objections.
>
>
> Please try the following script:
> CREATE TABLE pt (a int, b int) PARTITION BY range (a);
> CREATE TABLE tp1 PARTITION OF pt FOR VALUES FROM (1) TO (2);
> CREATE TABLE tp2 PARTITION OF pt FOR VALUES FROM (2) TO (3);
>
> MERGE INTO pt
> USING (SELECT pg_backend_pid() AS pid) AS q JOIN tp1 ON (q.pid = tp1.a)
> ON pt.a = tp1.a
> WHEN MATCHED THEN DELETE;
>
> which fails for me with segfault:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at nodeModifyTable.c:3680
> 3680 relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
> (gdb) bt
> #0 ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at nodeModifyTable.c:3680
> #1 0x00005a9b67e6dfb5 in ExecInitModifyTable (node=0x5a9b9fbd5858, estate=0x5a9b9fbcbe20, eflags=0) at nodeModifyTable.c:4906
> #2 0x00005a9b67e273f7 in ExecInitNode (node=0x5a9b9fbd5858, estate=0x5a9b9fbcbe20, eflags=0) at execProcnode.c:177
> #3 0x00005a9b67e1b9d2 in InitPlan (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:1092
> #4 0x00005a9b67e1a524 in standard_ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:268
> #5 0x00005a9b67e1a223 in ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:142
> ...
>
> starting from cbc127917.
>
> (I've discovered this anomaly with SQLsmith.)
Thanks! It looks like I missed updating the MERGE-related lists in ModifyTable.
I've attached a fix with a test added based on your example. I plan to
push this on Monday.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] 0001-Fix-an-oversight-in-cbc127917-for-MERGE-handling.patch (6.5K, 2-0001-Fix-an-oversight-in-cbc127917-for-MERGE-handling.patch)
download | inline diff:
From 07784159aea4de7b5614fd7a39bb6eeafe07cb22 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 15 Feb 2025 16:39:54 +0900
Subject: [PATCH] Fix an oversight in cbc127917 for MERGE handling
ExecInitModifyTable() should also trim MERGE-related lists to exclude
result relations pruned during initial pruning.
Reported-by: Alexander Lakhin <[email protected]> (via sqlsmith)
Discussion: https://postgr.es/m/[email protected]
---
src/backend/executor/nodeModifyTable.c | 24 ++++++++++---
src/include/nodes/execnodes.h | 7 ++--
src/test/regress/expected/partition_prune.out | 34 +++++++++++++++++++
src/test/regress/sql/partition_prune.sql | 13 +++++++
4 files changed, 72 insertions(+), 6 deletions(-)
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a15e7863b0d..e0f859ba966 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3667,14 +3667,14 @@ ExecInitMerge(ModifyTableState *mtstate, EState *estate)
* anything here, do so there too.
*/
i = 0;
- foreach(lc, node->mergeActionLists)
+ foreach(lc, mtstate->mt_mergeActionLists)
{
List *mergeActionList = lfirst(lc);
Node *joinCondition;
TupleDesc relationDesc;
ListCell *l;
- joinCondition = (Node *) list_nth(node->mergeJoinConditions, i);
+ joinCondition = (Node *) list_nth(mtstate->mt_mergeJoinConditions, i);
resultRelInfo = mtstate->resultRelInfo + i;
i++;
relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
@@ -4475,6 +4475,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
List *withCheckOptionLists = NIL;
List *returningLists = NIL;
List *updateColnosLists = NIL;
+ List *mergeActionLists = NIL;
+ List *mergeJoinConditions = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4518,6 +4520,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
updateColnosLists = lappend(updateColnosLists, updateColnosList);
}
+ if (node->mergeActionLists)
+ {
+ List *mergeActionList = list_nth(node->mergeActionLists, i);
+
+ mergeActionLists = lappend(mergeActionLists, mergeActionList);
+ }
+ if (node->mergeJoinConditions)
+ {
+ List *mergeJoinCondition = list_nth(node->mergeJoinConditions, i);
+
+ mergeJoinConditions = lappend(mergeJoinConditions, mergeJoinCondition);
+ }
}
i++;
}
@@ -4544,6 +4558,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->mt_merge_updated = 0;
mtstate->mt_merge_deleted = 0;
mtstate->mt_updateColnosLists = updateColnosLists;
+ mtstate->mt_mergeActionLists = mergeActionLists;
+ mtstate->mt_mergeJoinConditions = mergeJoinConditions;
/*----------
* Resolve the target relation. This is the same as:
@@ -4599,8 +4615,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
- if (node->mergeActionLists)
- mergeActions = list_nth(node->mergeActionLists, i);
+ if (mergeActionLists)
+ mergeActions = list_nth(mergeActionLists, i);
if (resultRelInfo != mtstate->rootResultRelInfo)
{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e2d1dc1e067..66fa6133343 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1448,10 +1448,13 @@ typedef struct ModifyTableState
double mt_merge_deleted;
/*
- * List of valid updateColnosLists. Contains only those belonging to
- * unpruned relations from ModifyTable.updateColnosLists.
+ * Lists of valid updateColnosListsm, mergeActionLists, and
+ * mergeJoinConditions. These contain only those belonging to unpruned
+ * relations from the respective Lists in the ModifyTable.
*/
List *mt_updateColnosLists;
+ List *mt_mergeActionLists;
+ List *mt_mergeJoinConditions;
} ModifyTableState;
/* ----------------
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e667503c961..3261da28219 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4513,5 +4513,39 @@ execute update_part_abc_view (2, 'a');
ERROR: new row violates check option for view "part_abc_view"
DETAIL: Failing row contains (2, a, t).
deallocate update_part_abc_view;
+-- Runtime pruning on MERGE using a stable function
+create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable;
+explain (costs off)
+merge into part_abc_view pt
+using (select stable_one() as pid) as q join part_abc_1 pt1 on (q.pid = pt1.a) on pt.a = pt1.a
+when matched then delete returning pt.a;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Merge on part_abc
+ Merge on part_abc_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = stable_one()))
+ -> Materialize
+ -> Seq Scan on part_abc_1 pt1
+ Filter: (a = stable_one())
+(10 rows)
+
+merge into part_abc_view pt
+using (select stable_one() as pid) as q join part_abc_1 pt1 on (q.pid = pt1.a) on pt.a = pt1.a
+when matched then delete returning pt.a;
+ a
+---
+ 1
+(1 row)
+
+table part_abc_view;
+ a | b | c
+---+---+---
+ 2 | c | t
+(1 row)
+
drop view part_abc_view;
drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 730545e86a7..b27f3ace73c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1372,5 +1372,18 @@ execute update_part_abc_view (1, 'd');
explain (costs off) execute update_part_abc_view (2, 'a');
execute update_part_abc_view (2, 'a');
deallocate update_part_abc_view;
+
+-- Runtime pruning on MERGE using a stable function
+create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable;
+explain (costs off)
+merge into part_abc_view pt
+using (select stable_one() as pid) as q join part_abc_1 pt1 on (q.pid = pt1.a) on pt.a = pt1.a
+when matched then delete returning pt.a;
+
+merge into part_abc_view pt
+using (select stable_one() as pid) as q join part_abc_1 pt1 on (q.pid = pt1.a) on pt.a = pt1.a
+when matched then delete returning pt.a;
+table part_abc_view;
+
drop view part_abc_view;
drop table part_abc;
--
2.43.0
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-14 21:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-15 07:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2025-02-16 04:37 ` Junwang Zhao <[email protected]>
2025-02-17 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
0 siblings, 1 reply; 66+ messages in thread
From: Junwang Zhao @ 2025-02-16 04:37 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
Hi Amit,
On Sat, Feb 15, 2025 at 3:51 PM Amit Langote <[email protected]> wrote:
>
> Hi Alexander,
>
> On Sat, Feb 15, 2025 at 6:00 AM Alexander Lakhin <[email protected]> wrote:
> >
> > Hello Amit,
> >
> > 06.02.2025 04:35, Amit Langote wrote:
> >
> > I plan to push 0001 tomorrow, barring any objections.
> >
> >
> > Please try the following script:
> > CREATE TABLE pt (a int, b int) PARTITION BY range (a);
> > CREATE TABLE tp1 PARTITION OF pt FOR VALUES FROM (1) TO (2);
> > CREATE TABLE tp2 PARTITION OF pt FOR VALUES FROM (2) TO (3);
> >
> > MERGE INTO pt
> > USING (SELECT pg_backend_pid() AS pid) AS q JOIN tp1 ON (q.pid = tp1.a)
> > ON pt.a = tp1.a
> > WHEN MATCHED THEN DELETE;
> >
> > which fails for me with segfault:
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0 ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at nodeModifyTable.c:3680
> > 3680 relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
> > (gdb) bt
> > #0 ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at nodeModifyTable.c:3680
> > #1 0x00005a9b67e6dfb5 in ExecInitModifyTable (node=0x5a9b9fbd5858, estate=0x5a9b9fbcbe20, eflags=0) at nodeModifyTable.c:4906
> > #2 0x00005a9b67e273f7 in ExecInitNode (node=0x5a9b9fbd5858, estate=0x5a9b9fbcbe20, eflags=0) at execProcnode.c:177
> > #3 0x00005a9b67e1b9d2 in InitPlan (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:1092
> > #4 0x00005a9b67e1a524 in standard_ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:268
> > #5 0x00005a9b67e1a223 in ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) at execMain.c:142
> > ...
> >
> > starting from cbc127917.
> >
> > (I've discovered this anomaly with SQLsmith.)
>
> Thanks! It looks like I missed updating the MERGE-related lists in ModifyTable.
>
> I've attached a fix with a test added based on your example. I plan to
> push this on Monday.
>
I applied the patch and the problem solved, I have a small question that
should the following line
```
if (node->mergeActionLists == NIL)
```
be changed to
```
if (mtstate->mt_mergeActionLists == NIL)
```
ISTM that if we have pruned all the merge actions, there is no harm we
omit setting mtstate->mt_merge_subcommands to 0.
> --
> Thanks, Amit Langote
--
Regards
Junwang Zhao
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 11:28 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 14:07 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-09 07:10 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-12 07:58 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-23 07:15 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-01-31 08:31 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-06 02:35 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-14 21:00 ` Re: generic plans and "initial" pruning Alexander Lakhin <[email protected]>
2025-02-15 07:51 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2025-02-16 04:37 ` Re: generic plans and "initial" pruning Junwang Zhao <[email protected]>
@ 2025-02-17 07:15 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2025-02-17 07:15 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
Hi Junwang,
On Sun, Feb 16, 2025 at 1:37 PM Junwang Zhao <[email protected]> wrote:
> On Sat, Feb 15, 2025 at 3:51 PM Amit Langote <[email protected]> wrote:
> > Thanks! It looks like I missed updating the MERGE-related lists in ModifyTable.
> >
> > I've attached a fix with a test added based on your example. I plan to
> > push this on Monday.
> >
>
> I applied the patch and the problem solved,
Thanks for checking.
> I have a small question that
> should the following line
>
> ```
> if (node->mergeActionLists == NIL)
> ```
>
> be changed to
>
> ```
> if (mtstate->mt_mergeActionLists == NIL)
> ```
>
> ISTM that if we have pruned all the merge actions, there is no harm we
> omit setting mtstate->mt_merge_subcommands to 0.
Yeah, that seems harmless, so done.
I have pushed the fix now.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
@ 2024-12-05 13:53 ` Tomas Vondra <[email protected]>
2024-12-06 08:26 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Tomas Vondra @ 2024-12-05 13:53 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On 12/5/24 07:53, Amit Langote wrote:
> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
>> ...
>>
>>>> What if an
>>>> extension doesn't do that? What weirdness will happen?
>>>
>>> The QueryDesc.planstate won't contain a PlanState tree for starters
>>> and other state information that InitPlan() populates in EState based
>>> on the PlannedStmt.
>>
>> OK, and the consequence is that the query will fail, right?
>
> No, the core executor will retry the execution with a new updated
> plan. In the absence of the early return, the extension might even
> crash when accessing such incomplete QueryDesc.
>
> What the patch makes the ExecutorStart_hook do is similar to how
> InitPlan() will return early when locks taken on partitions that
> survive initial pruning invalidate the plan.
>
Isn't that what I said? My question was what happens if the extension
does not add the new ExecPlanStillValid() call - sorry if that wasn't
clear. If it can crash, that's what I meant by "fail".
>>>> Maybe it'd be
>>>> possible to at least check this in some other executor hook? Or at least
>>>> we could ensure the check was done in assert-enabled builds? Or
>>>> something to make extension authors aware of this?
>>>
>>> I've added a note in the commit message, but if that's not enough, one
>>> idea might be to change the return type of ExecutorStart_hook so that
>>> the extensions that implement it are forced to be adjusted. Say, from
>>> void to bool to indicate whether standard_ExecutorStart() succeeded
>>> and thus created a "valid" plan. I had that in the previous versions
>>> of the patch. Thoughts?
>>
>> Maybe. My concern is that this case (plan getting invalidated) is fairly
>> rare, so it's entirely plausible the extension will seem to work just
>> fine without the code update for a long time.
>
> You might see the errors like the one below when the core executor or
> a hook tries to initialize or process in some other way a known
> invalid plan, for example, because an unpruned partition's index got
> concurrently dropped before the executor got the lock:
>
> ERROR: could not open relation with OID xxx
>
Yeah, but how likely is that? How often get plans invalidated in regular
application workload. People don't create or drop indexes very often,
for example ...
Again, I'm not saying requiring the call would be unacceptable, I'm sure
we made similar changes in the past. But if it wasn't needed without too
much contortion, that would be nice.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:20 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
2024-12-05 13:53 ` Re: generic plans and "initial" pruning Tomas Vondra <[email protected]>
@ 2024-12-06 08:26 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2024-12-06 08:26 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>; Tom Lane <[email protected]>
On Thu, Dec 5, 2024 at 10:53 PM Tomas Vondra <[email protected]> wrote:
> On 12/5/24 07:53, Amit Langote wrote:
> > On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
> >> ...
> >>
> >>>> What if an
> >>>> extension doesn't do that? What weirdness will happen?
> >>>
> >>> The QueryDesc.planstate won't contain a PlanState tree for starters
> >>> and other state information that InitPlan() populates in EState based
> >>> on the PlannedStmt.
> >>
> >> OK, and the consequence is that the query will fail, right?
> >
> > No, the core executor will retry the execution with a new updated
> > plan. In the absence of the early return, the extension might even
> > crash when accessing such incomplete QueryDesc.
> >
> > What the patch makes the ExecutorStart_hook do is similar to how
> > InitPlan() will return early when locks taken on partitions that
> > survive initial pruning invalidate the plan.
>
> Isn't that what I said? My question was what happens if the extension
> does not add the new ExecPlanStillValid() call - sorry if that wasn't
> clear. If it can crash, that's what I meant by "fail".
Ok, I see. So, I suppose you meant to confirm if the invalid plan
won't silently be executed returning wrong results. Yes, I don't
think that would happen given the kinds of invalidations that are
possible. The various checks in the ExecInitNode() path, such as the
one that catches a missing index, will prevent the plan from running.
I may not have searched exhaustively enough though.
> >>>> Maybe it'd be
> >>>> possible to at least check this in some other executor hook? Or at least
> >>>> we could ensure the check was done in assert-enabled builds? Or
> >>>> something to make extension authors aware of this?
> >>>
> >>> I've added a note in the commit message, but if that's not enough, one
> >>> idea might be to change the return type of ExecutorStart_hook so that
> >>> the extensions that implement it are forced to be adjusted. Say, from
> >>> void to bool to indicate whether standard_ExecutorStart() succeeded
> >>> and thus created a "valid" plan. I had that in the previous versions
> >>> of the patch. Thoughts?
> >>
> >> Maybe. My concern is that this case (plan getting invalidated) is fairly
> >> rare, so it's entirely plausible the extension will seem to work just
> >> fine without the code update for a long time.
> >
> > You might see the errors like the one below when the core executor or
> > a hook tries to initialize or process in some other way a known
> > invalid plan, for example, because an unpruned partition's index got
> > concurrently dropped before the executor got the lock:
> >
> > ERROR: could not open relation with OID xxx
>
> Yeah, but how likely is that? How often get plans invalidated in regular
> application workload. People don't create or drop indexes very often,
> for example ...
Yeah, that's a valid point. Andres once mentioned that ANALYZE can
invalidate plans and that can occur frequently in busy systems.
> Again, I'm not saying requiring the call would be unacceptable, I'm sure
> we made similar changes in the past. But if it wasn't needed without too
> much contortion, that would be nice.
I tend to agree.
Another change introduced by the patch that extensions might need to
mind (noted in the commit message of v58-0004) is the addition of the
es_unpruned_relids field to EState. This field tracks the RT indexes
of relations that are locked and therefore safe to access during
execution. Importantly, it does not include the RT indexes of leaf
partitions that are pruned during "initial" pruning and thus remain
unlocked.
This change means that executor extensions can no longer assume that
all relations in the range table are locked and safe to access.
Instead, extensions must account for the possibility that some
relations, specifically pruned partitions, are not locked. Normally,
executor code accesses relations using ExecGetRangeTableRelation(),
which does not take a lock before returning the Relation pointer,
assuming that locks are already managed upstream.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
@ 2024-12-04 17:32 ` Tom Lane <[email protected]>
2024-12-05 02:14 ` Re: generic plans and "initial" pruning Amit Langote <[email protected]>
1 sibling, 1 reply; 66+ messages in thread
From: Tom Lane @ 2024-12-04 17:32 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Amit Langote <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Tomas Vondra <[email protected]> writes:
> I'm not forcing you to do elog, if you think ereport() is better. I'm
> only asking because AFAIK the "policy" is that ereport is for cases that
> think can happen (and thus get translated), while elog(ERROR) is for
> cases that we believe shouldn't happen.
The proposed coding looks fine from that perspective, because it uses
errmsg_internal and errdetail_internal which don't give rise to
translatable strings. Having said that, if we think this is a
"can't happen" case then it's fair to wonder why go to such lengths
to format it prettily. Also, I'd argue that the error message
style guidelines still apply, but this errdetail doesn't conform.
regards, tom lane
^ permalink raw reply [nested|flat] 66+ messages in thread
* Re: generic plans and "initial" pruning
2024-12-04 17:32 ` Re: generic plans and "initial" pruning Tom Lane <[email protected]>
@ 2024-12-05 02:14 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 66+ messages in thread
From: Amit Langote @ 2024-12-05 02:14 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, Dec 5, 2024 at 2:32 AM Tom Lane <[email protected]> wrote:
> Tomas Vondra <[email protected]> writes:
> > I'm not forcing you to do elog, if you think ereport() is better. I'm
> > only asking because AFAIK the "policy" is that ereport is for cases that
> > think can happen (and thus get translated), while elog(ERROR) is for
> > cases that we believe shouldn't happen.
>
> The proposed coding looks fine from that perspective, because it uses
> errmsg_internal and errdetail_internal which don't give rise to
> translatable strings. Having said that, if we think this is a
> "can't happen" case then it's fair to wonder why go to such lengths
> to format it prettily. Also, I'd argue that the error message
> style guidelines still apply, but this errdetail doesn't conform.
Thinking about this further, perhaps an Assert is sufficient here. An
Append/MergeAppend node's part_prune_index not pointing to the correct
entry in the global "flat" list of PartitionPruneInfos would indicate
a bug. It seems unlikely that user actions could cause this issue.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 66+ messages in thread
end of thread, other threads:[~2026-06-04 00:25 UTC | newest]
Thread overview: 66+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-12-04 17:20 ` Tomas Vondra <[email protected]>
2024-12-05 06:53 ` Amit Langote <[email protected]>
2024-12-05 11:28 ` Amit Langote <[email protected]>
2024-12-05 14:07 ` Tomas Vondra <[email protected]>
2024-12-06 08:18 ` Amit Langote <[email protected]>
2024-12-09 07:10 ` Amit Langote <[email protected]>
2024-12-12 07:58 ` Amit Langote <[email protected]>
2025-01-23 07:15 ` Amit Langote <[email protected]>
2025-01-31 08:31 ` Amit Langote <[email protected]>
2025-02-06 02:35 ` Amit Langote <[email protected]>
2025-02-12 11:53 ` Amit Langote <[email protected]>
2025-02-21 03:40 ` Amit Langote <[email protected]>
2025-02-21 06:04 ` Tom Lane <[email protected]>
2025-02-21 06:36 ` Amit Langote <[email protected]>
2025-02-21 08:07 ` Amit Langote <[email protected]>
2025-02-21 15:55 ` Tom Lane <[email protected]>
2025-02-22 02:13 ` Amit Langote <[email protected]>
2025-02-22 06:29 ` Amit Langote <[email protected]>
2025-02-22 15:00 ` Alexander Lakhin <[email protected]>
2025-02-22 17:02 ` Tender Wang <[email protected]>
2025-02-23 08:35 ` Amit Langote <[email protected]>
2025-02-23 12:46 ` Tender Wang <[email protected]>
2025-02-25 02:51 ` Amit Langote <[email protected]>
2025-05-20 03:06 ` Tom Lane <[email protected]>
2025-05-20 07:59 ` Tomas Vondra <[email protected]>
2025-05-21 10:22 ` Amit Langote <[email protected]>
2025-05-20 13:25 ` Amit Langote <[email protected]>
2025-05-20 15:38 ` Tom Lane <[email protected]>
2025-05-21 10:22 ` Amit Langote <[email protected]>
2025-05-22 08:12 ` Amit Langote <[email protected]>
2025-05-22 13:04 ` Tomas Vondra <[email protected]>
2025-05-23 02:17 ` Amit Langote <[email protected]>
2025-06-20 12:30 ` Amit Langote <[email protected]>
2025-07-17 12:11 ` Amit Langote <[email protected]>
2025-07-22 06:43 ` Amit Langote <[email protected]>
2025-11-12 14:17 ` Amit Langote <[email protected]>
2025-11-17 12:50 ` Amit Langote <[email protected]>
2025-11-20 07:30 ` Amit Langote <[email protected]>
2025-11-23 12:17 ` Tender Wang <[email protected]>
2025-11-25 01:56 ` Amit Langote <[email protected]>
2025-11-24 03:29 ` Chao Li <[email protected]>
2025-11-25 08:31 ` Amit Langote <[email protected]>
2026-02-11 04:05 ` Amit Langote <[email protected]>
2026-03-07 09:54 ` Amit Langote <[email protected]>
2026-03-09 04:41 ` Amit Langote <[email protected]>
2026-03-19 17:20 ` Amit Langote <[email protected]>
2026-03-25 07:39 ` Amit Langote <[email protected]>
2026-03-26 09:24 ` Amit Langote <[email protected]>
2026-03-27 09:00 ` Amit Langote <[email protected]>
2026-04-04 12:10 ` Amit Langote <[email protected]>
2026-05-27 12:03 ` Thom Brown <[email protected]>
2026-05-28 08:13 ` Amit Langote <[email protected]>
2026-05-28 13:13 ` Thom Brown <[email protected]>
2026-05-29 08:56 ` Amit Langote <[email protected]>
2026-05-29 10:30 ` Thom Brown <[email protected]>
2026-06-02 17:54 ` Ilmar Yunusov <[email protected]>
2026-06-04 00:25 ` Amit Langote <[email protected]>
2025-05-22 13:50 ` Robert Haas <[email protected]>
2025-02-14 21:00 ` Alexander Lakhin <[email protected]>
2025-02-15 07:51 ` Amit Langote <[email protected]>
2025-02-16 04:37 ` Junwang Zhao <[email protected]>
2025-02-17 07:15 ` Amit Langote <[email protected]>
2024-12-05 13:53 ` Tomas Vondra <[email protected]>
2024-12-06 08:26 ` Amit Langote <[email protected]>
2024-12-04 17:32 ` Tom Lane <[email protected]>
2024-12-05 02:14 ` Amit Langote <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox