Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sXTZ0-004FD1-GS for pgsql-hackers@arkaria.postgresql.org; Fri, 26 Jul 2024 22:37:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sXTYz-000pGo-2L for pgsql-hackers@arkaria.postgresql.org; Fri, 26 Jul 2024 22:37:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sXTYy-000pGf-Q3 for pgsql-hackers@lists.postgresql.org; Fri, 26 Jul 2024 22:37:24 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sXTYv-001eQ2-Vv for pgsql-hackers@postgresql.org; Fri, 26 Jul 2024 22:37:23 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 46QMbAMa1870580; Fri, 26 Jul 2024 18:37:10 -0400 From: Tom Lane To: Justin Pryzby cc: Alexander Korotkov , Nathan Bossart , Michael Banck , Laurenz Albe , vignesh C , "Kumar, Sachin" , Robins Tharakan , Jan Wieck , Bruce Momjian , Andrew Dunstan , Magnus Hagander , Peter Eisentraut , pgsql-hackers@postgresql.org Subject: Re: pg_upgrade failing for 200+ million Large Objects In-reply-to: References: <4a3ebf7d81bfc6dd4d545e5b27d6e8f6c32d8937.camel@cybertec.at> <3023817.1710629175@sss.pgh.pa.us> <6603e4e0.500a0220.a557f.4f39@mx.google.com> <3304322.1711551245@sss.pgh.pa.us> <20240327150826.GB3994937@nathanxps13> <20240401191930.GA2302032@nathanxps13> <1217588.1711999706@sss.pgh.pa.us> Comments: In-reply-to Justin Pryzby message dated "Fri, 26 Jul 2024 15:36:20 -0500" MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0" Content-ID: <1870532.1722033398.0@sss.pgh.pa.us> Date: Fri, 26 Jul 2024 18:37:10 -0400 Message-ID: <1870579.1722033430@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <1870532.1722033398.1@sss.pgh.pa.us> Content-Transfer-Encoding: quoted-printable Justin Pryzby writes: > On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote: >> It would be nice to identify such cases and check which memory contexts= are >> growing and why. > I reproduced the problem with this schema: > SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', array_t= o_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM gene= rate_series(1,999)i); > SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s= )', i,i,i+1) FROM generate_series(1,999)i; > This used over 4 GB of RAM. Interesting. This doesn't bloat particularly much in a regular pg_restore, even with --transaction-size=3D1000; but it does in pg_upgrade, as you say. I found that the bloat was occurring during these long sequences of UPDATE commands issued by pg_upgrade: -- For binary upgrade, recreate inherited column. UPDATE pg_catalog.pg_attribute SET attislocal =3D false WHERE attname =3D 'i' AND attrelid =3D '\"public\".\"t139\"'::pg_catalog.regclass; -- For binary upgrade, recreate inherited column. UPDATE pg_catalog.pg_attribute SET attislocal =3D false WHERE attname =3D 'i1' AND attrelid =3D '\"public\".\"t139\"'::pg_catalog.regclass; -- For binary upgrade, recreate inherited column. UPDATE pg_catalog.pg_attribute SET attislocal =3D false WHERE attname =3D 'i2' AND attrelid =3D '\"public\".\"t139\"'::pg_catalog.regclass; I think the problem is basically that each one of these commands causes a relcache inval, for which we can't reclaim space right away, so that we end up consuming O(N^2) cache space for an N-column inherited table. It's fairly easy to fix things so that this example doesn't cause that to happen: we just need to issue these updates as one command not N commands per table. See attached. However, I fear this should just be considered a draft, because the other code for binary upgrade in the immediate vicinity is just as aggressively stupid and unoptimized as this bit, and can probably also be driven to O(N^2) behavior with enough CHECK constraints etc. We've gone out of our way to make ALTER TABLE capable of handling many updates to a table's DDL in one command, but whoever wrote this code appears not to have read that memo, or at least to have believed that performance of pg_upgrade isn't of concern. > Note that there seemed to be no issue when I created 999 tables without > partitioning: > SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM generate_series(= 1,999)i; Yeah, because then we don't need to play games with attislocal. regards, tom lane ------- =_aaaaaaaaaa0 Content-Type: text/x-diff; name="reduce-pg_upgrade-cache-bloat-wip.patch"; charset="us-ascii" Content-ID: <1870532.1722033398.2@sss.pgh.pa.us> Content-Description: reduce-pg_upgrade-cache-bloat-wip.patch Content-Transfer-Encoding: quoted-printable diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index b8b1888bd3..19f98bdf43 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -16094,6 +16094,12 @@ dumpTableSchema(Archive *fout, const TableInfo *t= binfo) tbinfo->relkind =3D=3D RELKIND_FOREIGN_TABLE || tbinfo->relkind =3D=3D RELKIND_PARTITIONED_TABLE)) { + bool firstinhcol =3D true; + + /* + * Drop any dropped columns. We don't really expect there to be a + * lot, else this code would be pretty inefficient. + */ for (j =3D 0; j < tbinfo->numatts; j++) { if (tbinfo->attisdropped[j]) @@ -16120,18 +16126,37 @@ dumpTableSchema(Archive *fout, const TableInfo *= tbinfo) appendPQExpBuffer(q, "DROP COLUMN %s;\n", fmtId(tbinfo->attnames[j])); } - else if (!tbinfo->attislocal[j]) + } + + /* + * Fix up inherited columns. There could be a lot of these, so we + * do the operation in a single SQL command; otherwise, we risk + * O(N^2) relcache bloat thanks to repeatedly invalidating the + * table's relcache entry. + */ + for (j =3D 0; j < tbinfo->numatts; j++) + { + if (!tbinfo->attisdropped[j] && + !tbinfo->attislocal[j]) { - appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited= column.\n"); - appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n" - "SET attislocal =3D false\n" - "WHERE attname =3D "); + if (firstinhcol) + { + appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherite= d columns.\n"); + appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n" + "SET attislocal =3D false\n" + "WHERE attrelid =3D "); + appendStringLiteralAH(q, qualrelname, fout); + appendPQExpBufferStr(q, "::pg_catalog.regclass\n" + " AND attname IN ("); + firstinhcol =3D false; + } + else + appendPQExpBufferStr(q, ", "); appendStringLiteralAH(q, tbinfo->attnames[j], fout); - appendPQExpBufferStr(q, "\n AND attrelid =3D "); - appendStringLiteralAH(q, qualrelname, fout); - appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); } } + if (!firstinhcol) + appendPQExpBufferStr(q, ");\n"); = /* * Add inherited CHECK constraints, if any. ------- =_aaaaaaaaaa0--