MIME-Version: 1.0
References: <CA+HiwqFpZ80UJKr4tZus4Omgg7YESzFXKSwSHRW2Ap2=XSVyUA@mail.gmail.com>
 <CA+HiwqHzKO9FT-CjFWo6OmkiCSYmbPspKXVex96tOBKf6S_x_w@mail.gmail.com>
 <CA+TgmoZGWyMXutfen-NNv9=QM7eCHn9R1bpLZ9N4sRURMOCK2A@mail.gmail.com>
 <CA+HiwqHNb9jrwOFHfmASfiGc=SnqXs7THwQ_Rta=z+ognYV8qw@mail.gmail.com>
 <CA+HiwqH9u1RWn9OEa=VQQpJagB0hDLCY+=fSyBC4ZkeU6Gg2HA@mail.gmail.com>
 <CA+HiwqFMWt2MQVqhp2rZA8=ugPVD=5uW10QCdK_NpoyWyFLe-g@mail.gmail.com>
 <CA+HiwqGBpw_JNwkwZjQ2YaqTWrDjn9L5jpuc+nS8=a55SPD+UA@mail.gmail.com>
 <CA+HiwqFGz2uShfU=qtack9gii6Kzyjv1V66tJJBYBN8Acb4uTA@mail.gmail.com>
 <CA+HiwqE7+iwMH4NYtFi28Pt9fT_gRW+Gt-=CvOX=Pkquo=AN8w@mail.gmail.com>
 <CA+TgmobO_6irkJGkzkxHTR=kza_CG+0idAhFUWqGfXCVQQmuPg@mail.gmail.com>
 <CA+HiwqH45ZCQkWoLzjOyS6bQ9QsF7yDCKVwiEUtB_RwPFwLmQg@mail.gmail.com>
 <CA+HiwqHRRFQN6yZ54fBydOTM6ncqZBCmewZ6n519RjRdDsO44g@mail.gmail.com>
 <54c35fb9-da3a-4754-ab8c-46ed0b612465@vondra.me> <CA+HiwqH8N-SxEB6SysEBsYNgV_KJs66k9Z2SNmqVzbBP-60yWg@mail.gmail.com>
 <684c70d7-180e-461d-9377-600c2db581ba@vondra.me>
In-Reply-To: <684c70d7-180e-461d-9377-600c2db581ba@vondra.me>
From: Amit Langote <amitlangote09@gmail.com>
Date: Thu, 5 Dec 2024 15:53:32 +0900
Message-ID: <CA+HiwqEmG9YCQvG6uux7sO=jKFSAW6hA4Ea-ymfD+JhJAe4PWQ@mail.gmail.com>
Subject: Re: generic plans and "initial" pruning
To: Tomas Vondra <tomas@vondra.me>
Cc: Robert Haas <robertmhaas@gmail.com>, Alvaro Herrera <alvherre@alvh.no-ip.org>, 
	Andres Freund <andres@anarazel.de>, Daniel Gustafsson <daniel@yesql.se>, 
	David Rowley <dgrowleyml@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, 
	Thom Brown <thom@linux.com>, Tom Lane <tgl@sss.pgh.pa.us>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/CA%2BHiwqEmG9YCQvG6uux7sO%3DjKFSAW6hA4Ea-ymfD%2BJhJAe4PWQ%40mail.gmail.com>
Precedence: bulk

On Thu, Dec 5, 2024 at 2:20=E2=80=AFAM Tomas Vondra <tomas@vondra.me> wrote=
:
> On 12/4/24 14:34, Amit Langote wrote:
> > On Mon, Dec 2, 2024 at 3:36=E2=80=AFAM Tomas Vondra <tomas@vondra.me> w=
rote:
> >> 0001
> >> ----
> >>
> >> 1) But if we don't expect this error to actually happen, do we really
> >> need to make it ereport()? Maybe it should be plain elog(). I mean, it=
's
> >> "can't happen" and thus doesn't need translations etc.
> >>
> >>     if (!bms_equal(relids, pruneinfo->relids))
> >>         ereport(ERROR,
> >>                 errcode(ERRCODE_INTERNAL_ERROR),
> >>                 errmsg_internal("mismatching PartitionPruneInfo found =
at
> >> part_prune_index %d",
> >>                                 part_prune_index),
> >>                 errdetail_internal("plan node relids %s, pruneinfo
> >> relids %s",
> >>                                    bmsToString(relids),
> >>                                    bmsToString(pruneinfo->relids)));
> >
> > I'm fine with elog() here even if it causes the message to be longer:
> >
> > elog(ERROR, "mismatching PartitionPruneInfo found at part_prune_index
> > %d (plan node relids %s, pruneinfo relids %s)
> >
>
> I'm not forcing you to do elog, if you think ereport() is better. I'm
> only asking because AFAIK the "policy" is that ereport is for cases that
> think can happen (and thus get translated), while elog(ERROR) is for
> cases that we believe shouldn't happen.
>
> So every time I see "ereport" I ask myself "how could this happen" which
> doesn't seem to be the case here.
>
> >> Perhaps it should even be an assert?
> >
> > I am not sure about that.  Having a message handy might be good if a
> > user ends up hitting this case for whatever reason, like trying to run
> > a corrupted plan.
>
> I'm a bit skeptical about this, TBH. If we assume the plan is
> "corrupted", why should we notice in this particular place? I mean, it
> could be corrupted in a million different ways, and the chance that it
> got through all the earlier steps is like 1 in a 1.000.000.

Yeah, I am starting to think the same.  Btw, the idea to have a check
and elog() / ereport() came from Alvaro upthread:
https://www.postgresql.org/message-id/20221130181201.mfinyvtob3j5i2a6%40alv=
herre.pgsql

> >> 2) I'm not quite sure what "exec" partition pruning is?
> >>
> >> /*
> >>  * ExecInitPartitionPruning
> >>  *   Initialize the data structures needed for runtime "exec" partitio=
n
> >>  *   pruning and return the result of initial pruning, if available.
> >>
> >> Is that the same thing as "runtime pruning"?
> >
> > "Exec" pruning refers to pruning performed during execution, using
> > PARAM_EXEC parameters. In contrast, "init" pruning occurs during plan
> > initialization, using parameters whose values remain constant during
> > execution, such as PARAM_EXTERN parameters and stable functions.
> >
> > Before this patch, the ExecInitPartitionPruning function, called
> > during ExecutorStart(), performed "init" pruning and set up state in
> > the PartitionPruneState for subsequent "exec" pruning during
> > ExecutorRun(). With this patch, "init" pruning is performed well
> > before this function is called, leaving its sole responsibility to
> > setting up the state for "exec" pruning. It may be worth renaming the
> > function to better reflect this new role, rather than updating only
> > the comment.
> >
> > Actually, that is what I decided to do in the attached, along with
> > some other adjustments like moving ExecDoInitialPruning() to
> > execPartition.c from execMain.c, fixing up some obsolete comments,
> > etc.
> >
>
> I don't see any attachment :-(
>
> Anyway, if I understand correctly, the "runtime pruning" has two
> separate cases - initial pruning and exec pruning. Is that right?

That's correct.  These patches are about performing "initial" pruning
at a different time and place so that we can take the deferred locks
on the unpruned partitions before we perform ExecInitNode() on any of
the plan trees in the PlannedStmt.

> >> 0005
> >> ----
> >>
> >> 1) auto_explain.c - So what happens if the plan gets invalidated? The
> >> hook explain_ExecutorStart returns early, but then what? Does that bre=
ak
> >> the user session somehow, or what?
> >
> > It will get called again after ExecutorStartExt() loops back to do
> > ExecutorStart() with a new updated plan tree.
> >
> >> 2) Isn't it a bit fragile if this requires every extension to update
> >> and add the ExecPlanStillValid() calls to various places?
> >
> > The ExecPlanStillValid() call only needs to be added immediately after
> > the call to standard_ExecutorStart() in an extension's
> > ExecutorStart_hook() implementation.
> >
> >> What if an
> >> extension doesn't do that? What weirdness will happen?
> >
> > The QueryDesc.planstate won't contain a PlanState tree for starters
> > and other state information that InitPlan() populates in EState based
> > on the PlannedStmt.
>
> OK, and the consequence is that the query will fail, right?

No, the core executor will retry the execution with a new updated
plan.  In the absence of the early return, the extension might even
crash when accessing such incomplete QueryDesc.

What the patch makes the ExecutorStart_hook do is similar to how
InitPlan() will return early when locks taken on partitions that
survive initial pruning invalidate the plan.

> >> Maybe it'd be
> >> possible to at least check this in some other executor hook? Or at lea=
st
> >> we could ensure the check was done in assert-enabled builds? Or
> >> something to make extension authors aware of this?
> >
> > I've added a note in the commit message, but if that's not enough, one
> > idea might be to change the return type of ExecutorStart_hook so that
> > the extensions that implement it are forced to be adjusted. Say, from
> > void to bool to indicate whether standard_ExecutorStart() succeeded
> > and thus created a "valid" plan.  I had that in the previous versions
> > of the patch.  Thoughts?
>
> Maybe. My concern is that this case (plan getting invalidated) is fairly
> rare, so it's entirely plausible the extension will seem to work just
> fine without the code update for a long time.

You might see the errors like the one below when the core executor or
a hook tries to initialize or process in some other way a known
invalid plan, for example, because an unpruned partition's index got
concurrently dropped before the executor got the lock:

ERROR: could not open relation with OID xxx

> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> might be a way to not have this issue, or at least notice the missing
> call early.
>
> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> the retries internally? I mean, the extensions wouldn't need to check if
> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> it might need some new arguments, but that's more obvious.

One approach could be to move some code from standard_ExecutorStart()
into ExecutorStart(). Specifically, the code responsible for setting
up enough state in the EState to perform ExecDoInitialPruning(), which
takes locks that might invalidate the plan. If the plan does become
invalid, the hook and standard_ExecutorStart() are not called.
Instead, the caller, ExecutorStartExt() in this case, creates a new
plan.

This avoids the need to add ExecPlanStillValid() checks anywhere,
whether in core or extension code. However, it does mean accessing the
PlannedStmt earlier than InitPlan(), but the current placement of the
code is not exactly set in stone.

--=20
Thanks, Amit Langote