MIME-Version: 1.0
References: 
 <CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com>
 <CAExHW5vaO3n8X7-0RytZqxZjF_99cLmfw-fFco_RUeRFDyXwCQ@mail.gmail.com>
In-Reply-To: 
 <CAExHW5vaO3n8X7-0RytZqxZjF_99cLmfw-fFco_RUeRFDyXwCQ@mail.gmail.com>
From: Amit Langote <amitlangote09@gmail.com>
Date: Fri, 31 Dec 2021 11:26:11 +0900
Message-ID: 
 <CA+HiwqGHGc9j2i14pX2LeBzXN9wt0OKR8_g8opaoGmSwTM0umA@mail.gmail.com>
Subject: Re: generic plans and "initial" pruning
To: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Content-Type: multipart/alternative; boundary="00000000000095bb8805d467e5b2"
Archived-At: 
 <https://www.postgresql.org/message-id/CA%2BHiwqGHGc9j2i14pX2LeBzXN9wt0OKR8_g8opaoGmSwTM0umA%40mail.gmail.com>
Precedence: bulk

--00000000000095bb8805d467e5b2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > Executing generic plans involving partitions is known to become slower
> > as partition count grows due to a number of bottlenecks, with
> > AcquireExecutorLocks() showing at the top in profiles.
> >
> > Previous attempt at solving that problem was by David Rowley [1],
> > where he proposed delaying locking of *all* partitions appearing under
> > an Append/MergeAppend until "initial" pruning is done during the
> > executor initialization phase.  A problem with that approach that he
> > has described in [2] is that leaving partitions unlocked can lead to
> > race conditions where the Plan node belonging to a partition can be
> > invalidated when a concurrent session successfully alters the
> > partition between AcquireExecutorLocks() saying the plan is okay to
> > execute and then actually executing it.
> >
> > However, using an idea that Robert suggested to me off-list a little
> > while back, it seems possible to determine the set of partitions that
> > we can safely skip locking.  The idea is to look at the "initial" or
> > "pre-execution" pruning instructions contained in a given Append or
> > MergeAppend node when AcquireExecutorLocks() is collecting the
> > relations to lock and consider relations from only those sub-nodes
> > that survive performing those instructions.   I've attempted
> > implementing that idea in the attached patch.
> >
>
> In which cases, we will have "pre-execution" pruning instructions that
> can be used to skip locking partitions? Can you please give a few
> examples where this approach will be useful?


This is mainly to be useful for prepared queries, so something like:

prepare q as select * from partitioned_table where key =3D $1;

And that too when execute q(=E2=80=A6) uses a generic plan. Generic plans a=
re
problematic because it must contain nodes for all partitions (without any
plan time pruning), which means CheckCachedPlan() has to spend time
proportional to the number of partitions to determine that the plan is
still usable / has not been invalidated; most of that is
AcquireExecutorLocks().

Other bottlenecks, not addressed in this patch, pertain to some executor
startup/shutdown subroutines that process the range table of a PlannedStmt
in its entirety, whose length is also proportional to the number of
partitions when the plan is generic.

The benchmark is showing good results, indeed.


Thanks.
--=20
Amit Langote
EDB: http://www.enterprisedb.com

--00000000000095bb8805d467e5b2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div>On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat &lt;<a href=3D"mailto:ash=
utosh.bapat.oss@gmail.com">ashutosh.bapat.oss@gmail.com</a>&gt; wrote:<br><=
/div><div><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid=
;padding-left:1ex;border-left-color:rgb(204,204,204)">On Sat, Dec 25, 2021 =
at 9:06 AM Amit Langote &lt;<a href=3D"mailto:amitlangote09@gmail.com" targ=
et=3D"_blank">amitlangote09@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Executing generic plans involving partitions is known to become slower=
<br>
&gt; as partition count grows due to a number of bottlenecks, with<br>
&gt; AcquireExecutorLocks() showing at the top in profiles.<br>
&gt;<br>
&gt; Previous attempt at solving that problem was by David Rowley [1],<br>
&gt; where he proposed delaying locking of *all* partitions appearing under=
<br>
&gt; an Append/MergeAppend until &quot;initial&quot; pruning is done during=
 the<br>
&gt; executor initialization phase.=C2=A0 A problem with that approach that=
 he<br>
&gt; has described in [2] is that leaving partitions unlocked can lead to<b=
r>
&gt; race conditions where the Plan node belonging to a partition can be<br=
>
&gt; invalidated when a concurrent session successfully alters the<br>
&gt; partition between AcquireExecutorLocks() saying the plan is okay to<br=
>
&gt; execute and then actually executing it.<br>
&gt;<br>
&gt; However, using an idea that Robert suggested to me off-list a little<b=
r>
&gt; while back, it seems possible to determine the set of partitions that<=
br>
&gt; we can safely skip locking.=C2=A0 The idea is to look at the &quot;ini=
tial&quot; or<br>
&gt; &quot;pre-execution&quot; pruning instructions contained in a given Ap=
pend or<br>
&gt; MergeAppend node when AcquireExecutorLocks() is collecting the<br>
&gt; relations to lock and consider relations from only those sub-nodes<br>
&gt; that survive performing those instructions.=C2=A0 =C2=A0I&#39;ve attem=
pted<br>
&gt; implementing that idea in the attached patch.<br>
&gt;<br>
<br>
In which cases, we will have &quot;pre-execution&quot; pruning instructions=
 that<br>
can be used to skip locking partitions? Can you please give a few<br>
examples where this approach will be useful?</blockquote><div dir=3D"auto">=
<br></div><div dir=3D"auto">This is mainly to be useful for prepared querie=
s, so something like:</div><div dir=3D"auto"><br></div><div dir=3D"auto">pr=
epare q as select * from partitioned_table where key =3D $1;</div><div dir=
=3D"auto"><br></div><div dir=3D"auto">And that too when execute q(=E2=80=A6=
) uses a generic plan. Generic plans are problematic because it must contai=
n nodes for all partitions (without any plan time pruning), which means Che=
ckCachedPlan() has to spend time proportional to the number of partitions t=
o determine that the plan is still usable / has not been invalidated; most =
of that is AcquireExecutorLocks().</div><div dir=3D"auto"><br></div><div di=
r=3D"auto">Other bottlenecks, not addressed in this patch, pertain to some =
executor startup/shutdown subroutines that process the range table of a Pla=
nnedStmt in its entirety, whose length is also proportional to the number o=
f partitions when the plan is generic.</div><div dir=3D"auto"><br></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(=
204,204,204)" dir=3D"auto">
The benchmark is showing good results, indeed.</blockquote><div dir=3D"auto=
"><br></div><div dir=3D"auto">Thanks.</div></div></div>-- <br><div dir=3D"l=
tr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=
=3D"ltr">Amit Langote<br>EDB: <a href=3D"http://www.enterprisedb.com" targe=
t=3D"_blank">http://www.enterprisedb.com</a></div></div>

--00000000000095bb8805d467e5b2--