public inbox for [email protected]
help / color / mirror / Atom feedFrom: Amit Langote <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Alvaro Herrera <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Daniel Gustafsson <[email protected]>
Cc: David Rowley <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Thom Brown <[email protected]>
Cc: Tom Lane <[email protected]>
Subject: Re: generic plans and "initial" pruning
Date: Fri, 6 Dec 2024 17:18:47 +0900
Message-ID: <CA+HiwqFhkpXHAA=4NY5SqYXX08uq=nYtXcSByNZF=2MAy1UA7A@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CA+HiwqFpZ80UJKr4tZus4Omgg7YESzFXKSwSHRW2Ap2=XSVyUA@mail.gmail.com>
<CA+HiwqH9u1RWn9OEa=VQQpJagB0hDLCY+=fSyBC4ZkeU6Gg2HA@mail.gmail.com>
<CA+HiwqFMWt2MQVqhp2rZA8=ugPVD=5uW10QCdK_NpoyWyFLe-g@mail.gmail.com>
<CA+HiwqGBpw_JNwkwZjQ2YaqTWrDjn9L5jpuc+nS8=a55SPD+UA@mail.gmail.com>
<CA+HiwqFGz2uShfU=qtack9gii6Kzyjv1V66tJJBYBN8Acb4uTA@mail.gmail.com>
<CA+HiwqE7+iwMH4NYtFi28Pt9fT_gRW+Gt-=CvOX=Pkquo=AN8w@mail.gmail.com>
<CA+TgmobO_6irkJGkzkxHTR=kza_CG+0idAhFUWqGfXCVQQmuPg@mail.gmail.com>
<CA+HiwqH45ZCQkWoLzjOyS6bQ9QsF7yDCKVwiEUtB_RwPFwLmQg@mail.gmail.com>
<CA+HiwqHRRFQN6yZ54fBydOTM6ncqZBCmewZ6n519RjRdDsO44g@mail.gmail.com>
<[email protected]>
<CA+HiwqH8N-SxEB6SysEBsYNgV_KJs66k9Z2SNmqVzbBP-60yWg@mail.gmail.com>
<[email protected]>
<CA+HiwqEmG9YCQvG6uux7sO=jKFSAW6hA4Ea-ymfD+JhJAe4PWQ@mail.gmail.com>
<CA+HiwqE2FfJfH=siLiR3kJ13tmXZORAGTWsZc2r52o1_5BDv+g@mail.gmail.com>
<[email protected]>
On Thu, Dec 5, 2024 at 11:07 PM Tomas Vondra <[email protected]> wrote:
> On 12/5/24 12:28, Amit Langote wrote:
> > On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <[email protected]> wrote:
> >> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <[email protected]> wrote:
> >>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> >>> might be a way to not have this issue, or at least notice the missing
> >>> call early.
> >>>
> >>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> >>> the retries internally? I mean, the extensions wouldn't need to check if
> >>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> >>> it might need some new arguments, but that's more obvious.
> >>
> >> One approach could be to move some code from standard_ExecutorStart()
> >> into ExecutorStart(). Specifically, the code responsible for setting
> >> up enough state in the EState to perform ExecDoInitialPruning(), which
> >> takes locks that might invalidate the plan. If the plan does become
> >> invalid, the hook and standard_ExecutorStart() are not called.
> >> Instead, the caller, ExecutorStartExt() in this case, creates a new
> >> plan.
> >>
> >> This avoids the need to add ExecPlanStillValid() checks anywhere,
> >> whether in core or extension code. However, it does mean accessing the
> >> PlannedStmt earlier than InitPlan(), but the current placement of the
> >> code is not exactly set in stone.
> >
> > I tried this approach and found that it essentially disables testing
> > of this patch using the delay_execution module, which relies on the
> > ExecutorStart_hook(). The way the testing works is that the hook in
> > delay_execution.c pauses the execution of a cached plan to allow a
> > concurrent session to drop an index referenced in the plan. When
> > unpaused, execution initialization resumes by calling
> > standard_ExecutorStart(). At this point, obtaining the lock on the
> > partition whose index has been dropped invalidates the plan, which the
> > hook detects and reports. It then also reports the successful
> > re-execution of an updated plan that no longer references the dropped
> > index. Hmm.
> >
>
> It's not clear to me why the change disables this testing, and I can't
> try without a patch. Could you explain?
Sorry, PFA the delta patch for the change I described above. It
applies on top of v58 series of patches that I posted yesterday.
You'll notice that delay_execution test fails if you apply and do
check-world.
As for how the change breaks the testing, here is a before and after
of the flow of a isolation test in
src/test/modules/delay_execution/specs/cached-plan-inval.spec (s1 is
the session used to run a cached plan, s2 to perform concurrent DDL
that invalidates the plan):
* Before (working):
1. s2 takes advisory lock
2. s1 runs cached plan -> goes to ExecutorStart_hook -> waits for the
advisory lock
3. s2 drops an index referenced in the plan
4. s2 unlocks advisory lock
5. s1 locks unpruned partitions -> detects plan invalidation due to
dropped index.
* After (stops working because initial pruning and locking are done
before calling ExecutorStart_hook):
1. s2 takes advisory lock
2. s1 runs cached plan -> locks unpruned partitions -> goes to
ExecutorStart_hook to get advisory lock -> waits for advisory lock
3. s2 drops an index referenced in the plan -> waits for lock on the
unpruned partition -> deadlock!
One idea I had after sending the email yesterday is to introduce
ExecutorStartCachedPlan_hook for the advisory lock based waiting.
ExecutorStartCachedPlan() is the new function that you will find in
v58-0004 that wraps ExecutorStart() to handle plan invalidation. This
new hook would be called before ExecutorStartCachedPlan() calls
ExecutorStart(), so the original testing flow can still work.
Another idea might be to use injection points infra to introduce the
wait instead of the combination of a executor hook and advisory lock.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] pruning-in-ExecutorStart.diff (13.6K, 2-pruning-in-ExecutorStart.diff)
download | inline diff:
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 8b5eaf3ef3..623a674f99 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -298,10 +298,6 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index b11691ae26..49c657b3e0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -994,10 +994,6 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!ExecPlanStillValid(queryDesc->estate))
- return;
-
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9543d9490c..18758287bf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -123,6 +123,16 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ EState *estate;
+ MemoryContext oldcontext;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *rangeTable = plannedstmt->rtable;
+ CachedPlan *cachedplan = queryDesc->cplan;
+
+ /* sanity checks: queryDesc must not be started already */
+ Assert(queryDesc != NULL);
+ Assert(queryDesc->estate == NULL);
+
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -133,6 +143,117 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+ /*
+ * Build EState, switch into per-query memory context for startup.
+ */
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+ /*
+ * Fill in external parameters, if any, from queryDesc; and allocate
+ * workspace for internal parameters
+ */
+ estate->es_param_list_info = queryDesc->params;
+
+ if (queryDesc->plannedstmt->paramExecTypes != NIL)
+ {
+ int nParamExec;
+
+ nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
+ estate->es_param_exec_vals = (ParamExecData *)
+ palloc0(nParamExec * sizeof(ParamExecData));
+ }
+
+ /* We now require all callers to provide sourceText */
+ Assert(queryDesc->sourceText != NULL);
+ estate->es_sourceText = queryDesc->sourceText;
+
+ /*
+ * Fill in the query environment, if any, from queryDesc.
+ */
+ estate->es_queryEnv = queryDesc->queryEnv;
+
+ /*
+ * If non-read-only query, set the command ID to mark output tuples with
+ */
+ switch (queryDesc->operation)
+ {
+ case CMD_SELECT:
+
+ /*
+ * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
+ * tuples
+ */
+ if (queryDesc->plannedstmt->rowMarks != NIL ||
+ queryDesc->plannedstmt->hasModifyingCTE)
+ estate->es_output_cid = GetCurrentCommandId(true);
+
+ /*
+ * A SELECT without modifying CTEs can't possibly queue triggers,
+ * so force skip-triggers mode. This is just a marginal efficiency
+ * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
+ * all that expensive, but we might as well do it.
+ */
+ if (!queryDesc->plannedstmt->hasModifyingCTE)
+ eflags |= EXEC_FLAG_SKIP_TRIGGERS;
+ break;
+
+ case CMD_INSERT:
+ case CMD_DELETE:
+ case CMD_UPDATE:
+ case CMD_MERGE:
+ estate->es_output_cid = GetCurrentCommandId(true);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized operation code: %d",
+ (int) queryDesc->operation);
+ break;
+ }
+
+ /*
+ * Copy other important information into the EState
+ */
+ estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
+ estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
+ estate->es_top_eflags = eflags;
+ estate->es_instrument = queryDesc->instrument_options;
+ estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
+
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_unpruned_relids = bms_copy(plannedstmt->unprunableRelids);
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+
+ /*
+ * initialize the node's execution state
+ */
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+
+ /*
+ * Perform runtime "initial" pruning to identify which child subplans,
+ * corresponding to the children of plan nodes that contain
+ * PartitionPruneInfo such as Append, will not be executed. The results,
+ * which are bitmapsets of indexes of the child subplans that will be
+ * executed, are saved in es_part_prune_results. These results correspond
+ * to each PartitionPruneInfo entry, and the es_part_prune_results list is
+ * parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate, cachedplan);
+
+ MemoryContextSwitchTo(oldcontext);
+
+ if (cachedplan && !CachedPlanValid(cachedplan))
+ return;
+
if (ExecutorStart_hook)
(*ExecutorStart_hook) (queryDesc, eflags);
else
@@ -198,12 +319,12 @@ ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- EState *estate;
MemoryContext oldcontext;
+ EState *estate;
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
+ Assert(queryDesc->estate != NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -227,85 +348,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
ExecCheckXactReadOnly(queryDesc->plannedstmt);
- /*
- * Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
-
- oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
-
- /*
- * Fill in external parameters, if any, from queryDesc; and allocate
- * workspace for internal parameters
- */
- estate->es_param_list_info = queryDesc->params;
-
- if (queryDesc->plannedstmt->paramExecTypes != NIL)
- {
- int nParamExec;
-
- nParamExec = list_length(queryDesc->plannedstmt->paramExecTypes);
- estate->es_param_exec_vals = (ParamExecData *)
- palloc0(nParamExec * sizeof(ParamExecData));
- }
-
- /* We now require all callers to provide sourceText */
- Assert(queryDesc->sourceText != NULL);
- estate->es_sourceText = queryDesc->sourceText;
-
- /*
- * Fill in the query environment, if any, from queryDesc.
- */
- estate->es_queryEnv = queryDesc->queryEnv;
-
- /*
- * If non-read-only query, set the command ID to mark output tuples with
- */
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
-
- /*
- * SELECT FOR [KEY] UPDATE/SHARE and modifying CTEs need to mark
- * tuples
- */
- if (queryDesc->plannedstmt->rowMarks != NIL ||
- queryDesc->plannedstmt->hasModifyingCTE)
- estate->es_output_cid = GetCurrentCommandId(true);
-
- /*
- * A SELECT without modifying CTEs can't possibly queue triggers,
- * so force skip-triggers mode. This is just a marginal efficiency
- * hack, since AfterTriggerBeginQuery/AfterTriggerEndQuery aren't
- * all that expensive, but we might as well do it.
- */
- if (!queryDesc->plannedstmt->hasModifyingCTE)
- eflags |= EXEC_FLAG_SKIP_TRIGGERS;
- break;
-
- case CMD_INSERT:
- case CMD_DELETE:
- case CMD_UPDATE:
- case CMD_MERGE:
- estate->es_output_cid = GetCurrentCommandId(true);
- break;
-
- default:
- elog(ERROR, "unrecognized operation code: %d",
- (int) queryDesc->operation);
- break;
- }
-
- /*
- * Copy other important information into the EState
- */
- estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
- estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
- estate->es_top_eflags = eflags;
- estate->es_instrument = queryDesc->instrument_options;
- estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
-
/*
* Set up an AFTER-trigger statement context, unless told not to, or
* unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
@@ -313,6 +355,9 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
AfterTriggerBeginQuery();
+ estate = queryDesc->estate;
+ oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
/*
* Initialize the plan state tree
*/
@@ -922,46 +967,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
- CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
-
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = cachedplan;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
- estate->es_unpruned_relids = bms_copy(plannedstmt->unprunableRelids);
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- *
- * This will also add the RT indexes of surviving leaf partitions to
- * es_unpruned_relids.
- */
- ExecDoInitialPruning(estate);
-
- if (!ExecPlanStillValid(estate))
- return;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b349ccb211..126f330ef8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1812,7 +1812,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* use the same index to retrieve the pruning results.
*/
void
-ExecDoInitialPruning(EState *estate)
+ExecDoInitialPruning(EState *estate, CachedPlan *cplan)
{
ListCell *lc;
List *locked_relids = NIL;
@@ -1838,7 +1838,7 @@ ExecDoInitialPruning(EState *estate)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- if (ExecShouldLockRelations(estate))
+ if (cplan && CachedPlanRequiresLocking(cplan))
{
int rtindex = -1;
@@ -1866,7 +1866,7 @@ ExecDoInitialPruning(EState *estate)
* Release the useless locks if the plan won't be executed. This is the
* same as what CheckCachedPlan() in plancache.c does.
*/
- if (!ExecPlanStillValid(estate))
+ if (cplan && !CachedPlanValid(cplan))
{
foreach(lc, locked_relids)
{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index a0843481f7..95d886884c 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -17,6 +17,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "partitioning/partprune.h"
+#include "utils/plancache.h"
/* See execPartition.c for the definitions. */
typedef struct PartitionDispatchData *PartitionDispatch;
@@ -136,7 +137,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
-void ExecDoInitialPruning(EState *estate);
+void ExecDoInitialPruning(EState *estate, CachedPlan *cplan);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6d72f7d9d6..6a7ca37753 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -265,31 +265,6 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
-/*
- * Is the CachedPlan in es_cachedplan still valid?
- *
- * Called from InitPlan() because invalidation messages that affect the plan
- * might be received after locks have been taken on runtime-prunable relations.
- * The caller should take appropriate action if the plan has become invalid.
- */
-static inline bool
-ExecPlanStillValid(EState *estate)
-{
- return estate->es_cachedplan == NULL ? true :
- CachedPlanValid(estate->es_cachedplan);
-}
-
-/*
- * Locks are needed only if running a cached plan that might contain unlocked
- * relations, such as a reused generic plan.
- */
-static inline bool
-ExecShouldLockRelations(EState *estate)
-{
- return estate->es_cachedplan == NULL ? false :
- CachedPlanRequiresLocking(estate->es_cachedplan);
-}
-
/* ----------------------------------------------------------------
* ExecProcNode
*
view thread (66+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: generic plans and "initial" pruning
In-Reply-To: <CA+HiwqFhkpXHAA=4NY5SqYXX08uq=nYtXcSByNZF=2MAy1UA7A@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox