Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1n37cj-0001hX-H8 for pgsql-hackers@arkaria.postgresql.org; Fri, 31 Dec 2021 02:26:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1n37ch-0001Ca-ML for pgsql-hackers@arkaria.postgresql.org; Fri, 31 Dec 2021 02:26:27 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1n37ch-0001CQ-1Z for pgsql-hackers@lists.postgresql.org; Fri, 31 Dec 2021 02:26:27 +0000 Received: from mail-pg1-x536.google.com ([2607:f8b0:4864:20::536]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1n37cd-0000FY-Op for pgsql-hackers@postgresql.org; Fri, 31 Dec 2021 02:26:25 +0000 Received: by mail-pg1-x536.google.com with SMTP id a23so22794732pgm.4 for ; Thu, 30 Dec 2021 18:26:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZjeP8r5TiqA5nSSqC1RjSVbG0IOBUsT3eTktuJR8TLE=; b=YKcS3Flb4SsnJHryXU78fi3lOtnqYD2Wphk3bHicL+X2QQgEDKe0k9WTEMw3Sc/NhF Lq0VmS19qWo735Vorg+qruwsvxiQMqU1aY4LxifqUo8BwCvMLtTDQ/dW7R2OGVty+bRc ELfengU9wW8EMU5NxHWQFhHeke8l4Pnhkhhq5NEe6tacnPErJez5Y8pHOHIgd8rVqPTG XhF88pGwCXXJxACiCUhG5eOxWWbTQmFg9hKQoTfSztISJIjct6v4EyYsFFvTxd8de0HC tk9feKjn3PoUGM/doipL+j3XHY5nzEpBQLR7YrlkZTha+sM/D+5b+bWVrGxzu4oXP0pe yJIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZjeP8r5TiqA5nSSqC1RjSVbG0IOBUsT3eTktuJR8TLE=; b=z1S1yDSLSkzZ9zQvhEywbF49hXnCiuzJPo+i3KRXt7Lzqmp2Wague2mH765gogs0Og VdoWdY1K14Gg1ABz5CZhim1GWRnZaQYjCMdFgSzkcCLJgCB7c+ip6+2N1YcjnCFRs2+8 dtzdpRJc1s0hxV1eAkvIXBooWGL48F7QTOSDuKu32gzJc3gk9IO++WqQ3Jj6YteMze+g /ykpB6//hs6TXZBS5Qcrjz3MrNi2CsnKsn8QI2MLKpHYPyo1GrLIYBjfNMHnxKhp/WqQ Eu/lK3MqNgX5bPhTFg4THvxiHIsiE6F1skcu2N0vspuvuhBnYCl1TJZr+dmNmwEkuQOx uDgQ== X-Gm-Message-State: AOAM533FXd8gfJzy6s4XQDNK5AvOf8S2oPpH0Bne0nwHxruO1R6jTsPb Z4QrXSIs5s44W6e/bxz3nmYUDhb2zEQAzDUnSDQ= X-Google-Smtp-Source: ABdhPJw8ojGR9dkA8NICm7BjhzARA48at1wbSM5WFILuZ5QfpZ37amVUCQ/9SKqmiWkUTJScFFMnCgJt0+t/AeM5mMw= X-Received: by 2002:a65:418b:: with SMTP id a11mr30085539pgq.620.1640917582067; Thu, 30 Dec 2021 18:26:22 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Amit Langote Date: Fri, 31 Dec 2021 11:26:11 +0900 Message-ID: Subject: Re: generic plans and "initial" pruning To: Ashutosh Bapat Cc: PostgreSQL-development Content-Type: multipart/alternative; boundary="00000000000095bb8805d467e5b2" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000095bb8805d467e5b2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat wrote: > On Sat, Dec 25, 2021 at 9:06 AM Amit Langote > wrote: > > > > Executing generic plans involving partitions is known to become slower > > as partition count grows due to a number of bottlenecks, with > > AcquireExecutorLocks() showing at the top in profiles. > > > > Previous attempt at solving that problem was by David Rowley [1], > > where he proposed delaying locking of *all* partitions appearing under > > an Append/MergeAppend until "initial" pruning is done during the > > executor initialization phase. A problem with that approach that he > > has described in [2] is that leaving partitions unlocked can lead to > > race conditions where the Plan node belonging to a partition can be > > invalidated when a concurrent session successfully alters the > > partition between AcquireExecutorLocks() saying the plan is okay to > > execute and then actually executing it. > > > > However, using an idea that Robert suggested to me off-list a little > > while back, it seems possible to determine the set of partitions that > > we can safely skip locking. The idea is to look at the "initial" or > > "pre-execution" pruning instructions contained in a given Append or > > MergeAppend node when AcquireExecutorLocks() is collecting the > > relations to lock and consider relations from only those sub-nodes > > that survive performing those instructions. I've attempted > > implementing that idea in the attached patch. > > > > In which cases, we will have "pre-execution" pruning instructions that > can be used to skip locking partitions? Can you please give a few > examples where this approach will be useful? This is mainly to be useful for prepared queries, so something like: prepare q as select * from partitioned_table where key =3D $1; And that too when execute q(=E2=80=A6) uses a generic plan. Generic plans a= re problematic because it must contain nodes for all partitions (without any plan time pruning), which means CheckCachedPlan() has to spend time proportional to the number of partitions to determine that the plan is still usable / has not been invalidated; most of that is AcquireExecutorLocks(). Other bottlenecks, not addressed in this patch, pertain to some executor startup/shutdown subroutines that process the range table of a PlannedStmt in its entirety, whose length is also proportional to the number of partitions when the plan is generic. The benchmark is showing good results, indeed. Thanks. --=20 Amit Langote EDB: http://www.enterprisedb.com --00000000000095bb8805d467e5b2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
<= /div>
On Sat, Dec 25, 2021 = at 9:06 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Executing generic plans involving partitions is known to become slower=
> as partition count grows due to a number of bottlenecks, with
> AcquireExecutorLocks() showing at the top in profiles.
>
> Previous attempt at solving that problem was by David Rowley [1],
> where he proposed delaying locking of *all* partitions appearing under=
> an Append/MergeAppend until "initial" pruning is done during= the
> executor initialization phase.=C2=A0 A problem with that approach that= he
> has described in [2] is that leaving partitions unlocked can lead to > race conditions where the Plan node belonging to a partition can be > invalidated when a concurrent session successfully alters the
> partition between AcquireExecutorLocks() saying the plan is okay to > execute and then actually executing it.
>
> However, using an idea that Robert suggested to me off-list a little > while back, it seems possible to determine the set of partitions that<= br> > we can safely skip locking.=C2=A0 The idea is to look at the "ini= tial" or
> "pre-execution" pruning instructions contained in a given Ap= pend or
> MergeAppend node when AcquireExecutorLocks() is collecting the
> relations to lock and consider relations from only those sub-nodes
> that survive performing those instructions.=C2=A0 =C2=A0I've attem= pted
> implementing that idea in the attached patch.
>

In which cases, we will have "pre-execution" pruning instructions= that
can be used to skip locking partitions? Can you please give a few
examples where this approach will be useful?
=
This is mainly to be useful for prepared querie= s, so something like:

pr= epare q as select * from partitioned_table where key =3D $1;

And that too when execute q(=E2=80=A6= ) uses a generic plan. Generic plans are problematic because it must contai= n nodes for all partitions (without any plan time pruning), which means Che= ckCachedPlan() has to spend time proportional to the number of partitions t= o determine that the plan is still usable / has not been invalidated; most = of that is AcquireExecutorLocks().

Other bottlenecks, not addressed in this patch, pertain to some = executor startup/shutdown subroutines that process the range table of a Pla= nnedStmt in its entirety, whose length is also proportional to the number o= f partitions when the plan is generic.

The benchmark is showing good results, indeed.

Thanks.
--
--00000000000095bb8805d467e5b2--