Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4lFY-002WwK-1u for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Mar 2026 19:47:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w4lFW-001uAg-2T for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Mar 2026 19:47:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4lFW-001uAV-1T for pgsql-hackers@lists.postgresql.org; Mon, 23 Mar 2026 19:47:42 +0000 Received: from mail-ot1-x334.google.com ([2607:f8b0:4864:20::334]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w4lFU-00000000kej-0UUI for pgsql-hackers@postgresql.org; Mon, 23 Mar 2026 19:47:42 +0000 Received: by mail-ot1-x334.google.com with SMTP id 46e09a7af769-7d74aa6bcdbso1739814a34.2 for ; Mon, 23 Mar 2026 12:47:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774295259; cv=none; d=google.com; s=arc-20240605; b=HQD80QricvZBmGkVL8HAwQEqUi7rtNsFkpDXwQNJyWM2uU0daYClddQDC5ymIPQGfL /HtEz7AzQPwOypogHnqsksVlwL2x5Vrmfc1r3Nm60mdQvAhO46pxHYb3Mv3GZ432PeAu axRtYuy2ROQIhgtuEQ7Z7/Da4ToqeiGPtW2/+UOP3HwKnSRIbU+VAt0AbGFQTTl91P89 CST326FOd/FC89DPRyoOwy/EFXTc84oaysZxKC7lK0QyPsjAJcTLJmxiQhorPJ+FeFk0 RS1pPMh3ydDimQxdbNG9J2aTdJcawCpbiewhi1BwfMUmUKc4OBC6SW7y8V9nJrPlPf+l wcWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=ad3+bzfIppPfsU5q9+8BVEf5gL8cwPrybwszJneIFRA=; fh=B96RO2Qlq9k/gSGdnIL6h2VdyxyMwTSnpIZnEgQJUss=; b=dyU6cgg0NOxNDwAtdJ6SGAlwEPuC9Qa2lccpK/dv4e+aWVIVMvzaX9hSfkZhjeNWpe VDDdV5qwbfrnW0aA3vzFR+ZNo9qq6Zl3uoGkTfAjQFlCuWa0zN7Ttzy3nwM4DHXdNbEl G6BJrQZC/C/isrjCuPkhRDP4V/y46mKMQUMkePH/RhNPzkLK+t14QQUABxjUv4952k2l dpWK3TORH1u3PwfSWnKVpaahy0Z131J8q9mLQY1qq38G0iloGYaua1ki+uVjc2Z6LWNr LLA6VZF7bRv1auRjJKN6r/F2XneEISqRAy9PuUms6qPzhTVYpFFC8d/HSDdHvpwPcb5t HH1Q==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774295259; x=1774900059; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ad3+bzfIppPfsU5q9+8BVEf5gL8cwPrybwszJneIFRA=; b=nn+HEyFJsA0ov68OGkFDOAaxf/0voT2Y6GeeyIjOdragmssCfxcX7Drr2woKx36luX jQQ5Bg4KX0AwuiPBwPWujmFLrMjy/xQl6UCv+ZrV4vC60P/2X96NSHgV7hKjSK3ZaJmG S4HHT/ZBEnIixME/P5kKbdekaIy3O8BotyE9cJXskW4IBWplJXywHz+P146bYPx4mzIz m59phUSWgfKM1asj+ziFS8WzvQ2Otvepft4/672t+T83+M6L+jbZt16O9Vihscgd5AzH coej4YVVKhY4ragMk08eeamO4cyl+NAjHD4xpBL3oErOjwAwqpDUaUEfTC3v1PF+bG6w ipWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774295259; x=1774900059; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ad3+bzfIppPfsU5q9+8BVEf5gL8cwPrybwszJneIFRA=; b=D4l1kTJ4snStjKx9lJzGNIUAs6D44joIBgTVrLqdcyjnZGtSin77gCerunYdi9nUec g3kYfU5zU03zWq48WFP/rC5c5/IagZWgAAy5rF83lRPy3G+TWZKmeIcDqIv+uWE/eiPy yyVk/8S79dv+O02I1Woh3qt8Sdd2kREFnCC2CKnCNIQJJpXSfdXoj8Oqcgdo+YDezRji 9ZBP3G5JwNVkJzCXx37FED7indGnLBgtLF3HC+s0qZmSW/5qINCeMuTMWIka0LVGy46T SIuLET0phLh4pAl+PID7lno0EAeFqnA8sgVL16OsYDGfGJ5fFlf8sbh4Col0TfarGgNu ql+w== X-Forwarded-Encrypted: i=1; AJvYcCU8Y5/l2Zi8jB4jRxlqd/xd2FdE3xBlmeJilYrN5dHXdsejhK0pJcv+bGDG7/Vo8FJ6fo4iAVcZ7fawrZAz@postgresql.org X-Gm-Message-State: AOJu0YxLPWWLp782179lnhB209MHziFqOmABeF9etF+NsLBC3mxykrUi vXHqNmDzTkiIn6QZ3pqCBMXyHNRnUD+GmgRupTJXlMruKdvBe0PW8WtNL2ytgDVwvO16u61XgKp RLXDySsDR9/p4+V+KSD0wCExzemW4JxM= X-Gm-Gg: ATEYQzxEZipw6ABmPIsrEcmX6UwYae8ckKejZ/9gvdjI6vntijkqm1EAK8mr1+SrzIe bY+r34OWSFmN9hdK1DMHLwiX+F2SrX8BfiqssjbNrunla2X2V5Q9AB6NJLXfcLPX55AKcJkwML8 KQoTdmGbbLwBkMFwPaoNiHhpnhzgjokyQjPagbz57h3Kva0vOMio5CC/VzRMm2xrzy5lmMSvuVU 8Rtn+keXcnWYTHmlQGpeE43JpK1idGQl+aOeuVBSZ7FOmIeMTHmC8mqJPTZVBbjo0KNkVu77DaC 0oXW3x4= X-Received: by 2002:a05:6830:4420:b0:7d7:b7da:4449 with SMTP id 46e09a7af769-7d7eb0bf169mr8643421a34.33.1774295258850; Mon, 23 Mar 2026 12:47:38 -0700 (PDT) MIME-Version: 1.0 References: <4a3ebf7d81bfc6dd4d545e5b27d6e8f6c32d8937.camel@cybertec.at> <3023817.1710629175@sss.pgh.pa.us> <6603e4e0.500a0220.a557f.4f39@mx.google.com> <3304322.1711551245@sss.pgh.pa.us> <20240327150826.GB3994937@nathanxps13> <20240401191930.GA2302032@nathanxps13> <1217588.1711999706@sss.pgh.pa.us> <1870579.1722033430@sss.pgh.pa.us> <1873872.1722035181@sss.pgh.pa.us> <1903479.1722049686@sss.pgh.pa.us> <2422717.1722201869@sss.pgh.pa.us> In-Reply-To: From: PP L Date: Mon, 23 Mar 2026 12:47:28 -0700 X-Gm-Features: AaiRm521aG8msrLX2Zrq74vVxa_JNz990BJoWddO7M4paUug8U3esHV1Z0Y_kh4 Message-ID: Subject: Re: pg_upgrade failing for 200+ million Large Objects To: Alexander Korotkov , Tom Lane Cc: Justin Pryzby , Nathan Bossart , Michael Banck , Laurenz Albe , vignesh C , "Kumar, Sachin" , Robins Tharakan , Jan Wieck , Bruce Momjian , Andrew Dunstan , Magnus Hagander , Peter Eisentraut , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000a168d3064db64e8e" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a168d3064db64e8e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello hackers, I wanted to revive this thread specifically around the attislocal optimization discussion. As part of https://github.com/postgres/postgres/commit/b3f0e0503f3, we now batch all the attislocal UPDATEs together, hence making it more performant. I think we might be able to go one step further and completely skip the attislocal UPDATE for partition tables. This is because the attislocal UPDATE is done immediately after 'CREATE TABLE', during the 'ATTACH PARTITION' step(see attislocal being set to false in MergeAttributesIntoExisting). The UPDATEs emitted by pg_dump are therefore redundant. Even with batching, the single UPDATE still modifies N(no of columns) rows causing N relcache invalidations. This same workflow is then repeated by ATTACH PARTITION causing another N relcache invalidations. Skipping the attislocal UPDATE definitely speeds up the runtime if there are a lot of partition tables because we will avoid quite a lot of relcache invalidations and rebuild calls. Since this optimization removes the attislocal UPDATE completely, the effect will be even more pronounced for wider partition tables. There's already precedent for this: * attinhcount is never explicitly set by pg_dump. It is only modified by MergeAttributesIntoExisting during ATTACH PARTITION * conislocal for CHECK constraints is explicitly not fixed for partitions. See comment "No need to fix conislocal: ATTACH PARTITION does that" in dumpTableSchema The only risk I can foresee is the window between CREATE TABLE and ATTACH PARTITION where attislocal will be incorrectly set to true. But I think this window is small enough to not worry about since ATTACH PARTITION immediately succeeds CREATE TABLE(maybe barring some other minor updates) Here is a simple patch ``` diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 137161aa5e0..d3d7403228a 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -17734,7 +17734,8 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) for (j =3D 0; j < tbinfo->numatts; j++) { if (!tbinfo->attisdropped[j] && - !tbinfo->attislocal[j]) + !tbinfo->attislocal[j] && + !tbinfo->ispartition) { if (firstitem) { ``` I tried a few experiments on my local macbook and noticed an improvement of around ~33-36% (% can vary mostly depending on the number of columns) Setup(master branch): 300 partitioned root tables with 200 leaves each =3D 60000 partition tables 300 columns per partition: baseline : ~30 mins with patch : ~20 mins (~33% faster) 700 columns per partition: baseline : ~66 mins with patch : ~42 mins (~36% faster) Thanks Nikhil Broadcom Inc. On Mon, 23 Mar 2026 at 11:50, Alexander Korotkov wrote: > On Mon, Jul 29, 2024 at 12:24=E2=80=AFAM Tom Lane wro= te: > > So I'm forced to the conclusion that we'd better make the transaction > > size adaptive as per Alexander's suggestion. > > > > In addition to the patches attached, I experimented with making > > dumpTableSchema fold all the ALTER TABLE commands for a single table > > into one command. That's do-able without too much effort, but I'm now > > convinced that we shouldn't. It would break the semicolon-counting > > hack for detecting that tables like these involve extra work. > > I'm also not very confident that the backend won't have trouble with > > ALTER TABLE commands containing hundreds of subcommands. That's > > something we ought to work on probably, but it's not a project that > > I want to condition v17 pg_upgrade's stability on. > > > > Anyway, proposed patches attached. 0001 is some trivial cleanup > > that I noticed while working on the failed single-ALTER-TABLE idea. > > 0002 merges the catalog-UPDATE commands that dumpTableSchema issues, > > and 0003 is Alexander's suggestion. > > Nice to see you picked up my idea. I took a look over the patchset. > Looks good to me. > > ------ > Regards, > Alexander Korotkov > Supabase > > > > > --000000000000a168d3064db64e8e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello hackers,

I wante= d to revive this thread specifically around the attislocal optimization dis= cussion.=C2=A0As part of https://github.com/postgres/postgres/commit/b3f0e0= 503f3, we now batch all the attislocal UPDATEs together, hence making i= t more performant.

I think we might be able to go one step further= and completely skip the attislocal UPDATE for partition tables. This is be= cause the attislocal UPDATE is done immediately after 'CREATE TABLE'= ;, during the 'ATTACH PARTITION' step(see attislocal being set to f= alse in MergeAttributesIntoExisting). The UPDATEs emitted by pg_dump are th= erefore redundant. Even with batching, the single UPDATE=C2=A0still modifie= s N(no of columns) rows causing N relcache invalidations. This same workflo= w is then repeated by ATTACH PARTITION causing another N relcache invalidat= ions.

Skipping the attislocal UPDATE definitely speeds up the runtim= e if there are a lot of partition tables because we will avoid quite a lot = of relcache invalidations and rebuild calls. Since this optimization remove= s the attislocal UPDATE completely, the effect will be even more pronounced= for wider partition tables.

There's already precedent for this:=
* attinhcount is never explicitly set by pg_dump. It is only modified b= y MergeAttributesIntoExisting during ATTACH PARTITION
* conislocal for C= HECK constraints is explicitly not fixed for partitions. See comment "= No need to fix conislocal: ATTACH PARTITION does that" in dumpTableSch= ema

The only risk I can foresee is the window between CREATE TABLE a= nd ATTACH PARTITION where attislocal will be incorrectly set to true. But I= think this window is small enough to not worry about since ATTACH PARTITIO= N immediately succeeds CREATE TABLE(maybe barring some other minor updates)=

Here is a simple patch
```
diff --git a/src/bin/pg_dump/pg_du= mp.c b/src/bin/pg_dump/pg_dump.c
index 137161aa5e0..d3d7403228a 100644--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ = -17734,7 +17734,8 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 for (j =3D 0; j < tbinfo->numatts; j++)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!tbinfo->attisdropped[j] &= amp;&
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 != tbinfo->attislocal[j])
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 !tbinfo->attislocal[j] &&
+ =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 !tbinfo->ispartition)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (firstitem)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {
```

I tried a few= experiments on my local macbook and noticed an improvement of around ~33-3= 6% (% can vary mostly depending on the number of columns)

Setup(mast= er branch): 300 partitioned root tables with 200 leaves each =3D 60000 part= ition tables

300 columns per partition:
=C2=A0 =C2=A0 baseline=C2= =A0 =C2=A0 : ~30 mins
=C2=A0 =C2=A0 with patch : ~20 mins (~33% faster)<= br>
700 columns per partition:
=C2=A0 =C2=A0 baseline =C2=A0 : ~66 mi= ns
=C2=A0 =C2=A0 with patch : ~42 mins (~36% faster)

<= div>Thanks
Nikhil
Broadcom Inc.


On= Mon, 23 Mar 2026 at 11:50, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jul 29, 2024 at 12:24=E2=80=AFA= M Tom Lane <tgl@s= ss.pgh.pa.us> wrote:
> So I'm forced to the conclusion that we'd better make the tran= saction
> size adaptive as per Alexander's suggestion.
>
> In addition to the patches attached, I experimented with making
> dumpTableSchema fold all the ALTER TABLE commands for a single table > into one command.=C2=A0 That's do-able without too much effort, bu= t I'm now
> convinced that we shouldn't.=C2=A0 It would break the semicolon-co= unting
> hack for detecting that tables like these involve extra work.
> I'm also not very confident that the backend won't have troubl= e with
> ALTER TABLE commands containing hundreds of subcommands.=C2=A0 That= 9;s
> something we ought to work on probably, but it's not a project tha= t
> I want to condition v17 pg_upgrade's stability on.
>
> Anyway, proposed patches attached.=C2=A0 0001 is some trivial cleanup<= br> > that I noticed while working on the failed single-ALTER-TABLE idea. > 0002 merges the catalog-UPDATE commands that dumpTableSchema issues, > and 0003 is Alexander's suggestion.

Nice to see you picked up my idea.=C2=A0 I took a look over the patchset. Looks good to me.

------
Regards,
Alexander Korotkov
Supabase




--000000000000a168d3064db64e8e--