public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tom Lane <[email protected]>
To: Justin Pryzby <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Michael Banck <[email protected]>
Cc: Laurenz Albe <[email protected]>
Cc: vignesh C <[email protected]>
Cc: Kumar, Sachin <[email protected]>
Cc: Robins Tharakan <[email protected]>
Cc: Jan Wieck <[email protected]>
Cc: Bruce Momjian <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Magnus Hagander <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: [email protected]
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Date: Fri, 26 Jul 2024 18:37:10 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <ZqQIxIsaPWO6coil@pryzbyj2023>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<20240327150826.GB3994937@nathanxps13>
	<20240401191930.GA2302032@nathanxps13>
	<[email protected]>
	<ZqEND4ZcTDBmcv31@pryzbyj2023>
	<CAPpHfduUSA7c0YrTxN443SwXXCL41vmXkavYJjN8GS3Wc5Z42w@mail.gmail.com>
	<ZqQIxIsaPWO6coil@pryzbyj2023>

Justin Pryzby <[email protected]> writes:
> On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote:
>> It would be nice to identify such cases and check which memory contexts are
>> growing and why.

> I reproduced the problem with this schema:

> SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM generate_series(1,999)i);
> SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', i,i,i+1) FROM generate_series(1,999)i;

> This used over 4 GB of RAM.

Interesting.  This doesn't bloat particularly much in a regular
pg_restore, even with --transaction-size=1000; but it does in
pg_upgrade, as you say.  I found that the bloat was occurring
during these long sequences of UPDATE commands issued by pg_upgrade:

-- For binary upgrade, recreate inherited column.
UPDATE pg_catalog.pg_attribute
SET attislocal = false
WHERE attname = 'i'
  AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass;

-- For binary upgrade, recreate inherited column.
UPDATE pg_catalog.pg_attribute
SET attislocal = false
WHERE attname = 'i1'
  AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass;

-- For binary upgrade, recreate inherited column.
UPDATE pg_catalog.pg_attribute
SET attislocal = false
WHERE attname = 'i2'
  AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass;

I think the problem is basically that each one of these commands
causes a relcache inval, for which we can't reclaim space right
away, so that we end up consuming O(N^2) cache space for an
N-column inherited table.

It's fairly easy to fix things so that this example doesn't cause
that to happen: we just need to issue these updates as one command
not N commands per table.  See attached.  However, I fear this should
just be considered a draft, because the other code for binary upgrade
in the immediate vicinity is just as aggressively stupid and
unoptimized as this bit, and can probably also be driven to O(N^2)
behavior with enough CHECK constraints etc.  We've gone out of our way
to make ALTER TABLE capable of handling many updates to a table's DDL
in one command, but whoever wrote this code appears not to have read
that memo, or at least to have believed that performance of pg_upgrade
isn't of concern.

> Note that there seemed to be no issue when I created 999 tables without
> partitioning:
> SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM generate_series(1,999)i;

Yeah, because then we don't need to play games with attislocal.

			regards, tom lane



Attachments:

  [text/x-diff] reduce-pg_upgrade-cache-bloat-wip.patch (2.2K, 2-reduce-pg_upgrade-cache-bloat-wip.patch)
  download | inline diff:
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index b8b1888bd3..19f98bdf43 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16094,6 +16094,12 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 			 tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
 			 tbinfo->relkind == RELKIND_PARTITIONED_TABLE))
 		{
+			bool		firstinhcol = true;
+
+			/*
+			 * Drop any dropped columns.  We don't really expect there to be a
+			 * lot, else this code would be pretty inefficient.
+			 */
 			for (j = 0; j < tbinfo->numatts; j++)
 			{
 				if (tbinfo->attisdropped[j])
@@ -16120,18 +16126,37 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 					appendPQExpBuffer(q, "DROP COLUMN %s;\n",
 									  fmtId(tbinfo->attnames[j]));
 				}
-				else if (!tbinfo->attislocal[j])
+			}
+
+			/*
+			 * Fix up inherited columns.  There could be a lot of these, so we
+			 * do the operation in a single SQL command; otherwise, we risk
+			 * O(N^2) relcache bloat thanks to repeatedly invalidating the
+			 * table's relcache entry.
+			 */
+			for (j = 0; j < tbinfo->numatts; j++)
+			{
+				if (!tbinfo->attisdropped[j] &&
+					!tbinfo->attislocal[j])
 				{
-					appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited column.\n");
-					appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n"
-										 "SET attislocal = false\n"
-										 "WHERE attname = ");
+					if (firstinhcol)
+					{
+						appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited columns.\n");
+						appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n"
+											 "SET attislocal = false\n"
+											 "WHERE attrelid = ");
+						appendStringLiteralAH(q, qualrelname, fout);
+						appendPQExpBufferStr(q, "::pg_catalog.regclass\n"
+											 "  AND attname IN (");
+						firstinhcol = false;
+					}
+					else
+						appendPQExpBufferStr(q, ", ");
 					appendStringLiteralAH(q, tbinfo->attnames[j], fout);
-					appendPQExpBufferStr(q, "\n  AND attrelid = ");
-					appendStringLiteralAH(q, qualrelname, fout);
-					appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");
 				}
 			}
+			if (!firstinhcol)
+				appendPQExpBufferStr(q, ");\n");
 
 			/*
 			 * Add inherited CHECK constraints, if any.


view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pg_upgrade failing for 200+ million Large Objects
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox