public inbox for [email protected]help / color / mirror / Atom feed
Re: pg_upgrade failing for 200+ million Large Objects 15+ messages / 5 participants [nested] [flat]
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-04-01 19:28 Tom Lane <[email protected]> 0 siblings, 2 replies; 15+ messages in thread From: Tom Lane @ 2024-04-01 19:28 UTC (permalink / raw) To: Nathan Bossart <[email protected]>; +Cc: Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers Nathan Bossart <[email protected]> writes: > Sorry for taking so long to get back to this one. Overall, I think the > code is in decent shape. Thanks for looking at it! > The one design point that worries me a little is the non-configurability of > --transaction-size in pg_upgrade. I think it's fine to default it to 1,000 > or something, but given how often I've had to fiddle with > max_locks_per_transaction, I'm wondering if we might regret hard-coding it. Well, we could add a command-line switch to pg_upgrade, but I'm unconvinced that it'd be worth the trouble. I think a very large fraction of users invoke pg_upgrade by means of packager-supplied scripts that are unlikely to provide a way to pass through such a switch. I'm inclined to say let's leave it as-is until we get some actual field requests for a switch. regards, tom lane ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-04-01 19:37 Nathan Bossart <[email protected]> parent: Tom Lane <[email protected]> 1 sibling, 0 replies; 15+ messages in thread From: Nathan Bossart @ 2024-04-01 19:37 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers On Mon, Apr 01, 2024 at 03:28:26PM -0400, Tom Lane wrote: > Nathan Bossart <[email protected]> writes: >> The one design point that worries me a little is the non-configurability of >> --transaction-size in pg_upgrade. I think it's fine to default it to 1,000 >> or something, but given how often I've had to fiddle with >> max_locks_per_transaction, I'm wondering if we might regret hard-coding it. > > Well, we could add a command-line switch to pg_upgrade, but I'm > unconvinced that it'd be worth the trouble. I think a very large > fraction of users invoke pg_upgrade by means of packager-supplied > scripts that are unlikely to provide a way to pass through such > a switch. I'm inclined to say let's leave it as-is until we get > some actual field requests for a switch. Okay. I'll let you know if I see anything. IIRC usually the pg_dump side of pg_upgrade is more prone to lock exhaustion, so you may very well be right that this is unnecessary. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-24 14:17 Justin Pryzby <[email protected]> parent: Tom Lane <[email protected]> 1 sibling, 1 reply; 15+ messages in thread From: Justin Pryzby @ 2024-07-24 14:17 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers On Mon, Apr 01, 2024 at 03:28:26PM -0400, Tom Lane wrote: > Nathan Bossart <[email protected]> writes: > > The one design point that worries me a little is the non-configurability of > > --transaction-size in pg_upgrade. I think it's fine to default it to 1,000 > > or something, but given how often I've had to fiddle with > > max_locks_per_transaction, I'm wondering if we might regret hard-coding it. > > Well, we could add a command-line switch to pg_upgrade, but I'm > unconvinced that it'd be worth the trouble. I think a very large > fraction of users invoke pg_upgrade by means of packager-supplied > scripts that are unlikely to provide a way to pass through such > a switch. I'm inclined to say let's leave it as-is until we get > some actual field requests for a switch. I've been importing our schemas and doing upgrade testing, and was surprised when a postgres backend was killed for OOM during pg_upgrade: Killed process 989302 (postgres) total-vm:5495648kB, anon-rss:5153292kB, ... Upgrading from v16 => v16 doesn't use nearly as much RAM. While tracking down the responsible commit, I reproduced the problem using a subset of tables; at 959b38d770, the backend process used ~650 MB RAM, and at its parent commit used at most ~120 MB. 959b38d770b Invent --transaction-size option for pg_restore. By changing RESTORE_TRANSACTION_SIZE to 100, backend RAM use goes to 180 MB during pg_upgrade, which is reasonable. With partitioning, we have a lot of tables, some of them wide (126 partitioned tables, 8942 childs, total 1019315 columns). I didn't track if certain parts of our schema contribute most to the high backend mem use, just that it's now 5x (while testing a subset) to 50x higher. We'd surely prefer that the transaction size be configurable. -- Justin ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-26 19:53 Alexander Korotkov <[email protected]> parent: Justin Pryzby <[email protected]> 0 siblings, 2 replies; 15+ messages in thread From: Alexander Korotkov @ 2024-07-26 19:53 UTC (permalink / raw) To: Justin Pryzby <[email protected]>; +Cc: Tom Lane <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers Hi, Justin! Thank you for sharing this. On Wed, Jul 24, 2024 at 5:18 PM Justin Pryzby <[email protected]> wrote: > On Mon, Apr 01, 2024 at 03:28:26PM -0400, Tom Lane wrote: > > Nathan Bossart <[email protected]> writes: > > > The one design point that worries me a little is the non-configurability of > > > --transaction-size in pg_upgrade. I think it's fine to default it to 1,000 > > > or something, but given how often I've had to fiddle with > > > max_locks_per_transaction, I'm wondering if we might regret hard-coding it. > > > > Well, we could add a command-line switch to pg_upgrade, but I'm > > unconvinced that it'd be worth the trouble. I think a very large > > fraction of users invoke pg_upgrade by means of packager-supplied > > scripts that are unlikely to provide a way to pass through such > > a switch. I'm inclined to say let's leave it as-is until we get > > some actual field requests for a switch. > > I've been importing our schemas and doing upgrade testing, and was > surprised when a postgres backend was killed for OOM during pg_upgrade: > > Killed process 989302 (postgres) total-vm:5495648kB, anon-rss:5153292kB, ... > > Upgrading from v16 => v16 doesn't use nearly as much RAM. > > While tracking down the responsible commit, I reproduced the problem > using a subset of tables; at 959b38d770, the backend process used > ~650 MB RAM, and at its parent commit used at most ~120 MB. > > 959b38d770b Invent --transaction-size option for pg_restore. > > By changing RESTORE_TRANSACTION_SIZE to 100, backend RAM use goes to > 180 MB during pg_upgrade, which is reasonable. > > With partitioning, we have a lot of tables, some of them wide (126 > partitioned tables, 8942 childs, total 1019315 columns). I didn't track > if certain parts of our schema contribute most to the high backend mem > use, just that it's now 5x (while testing a subset) to 50x higher. Do you think there is a way to anonymize the schema and share it? > We'd surely prefer that the transaction size be configurable. I think we can add an option to pg_upgrade. But I wonder if there is something else we can do. It seems that restoring some objects is much more expensive than restoring others. It would be nice to identify such cases and check which memory contexts are growing and why. It would be helpful if you could share your data schema, so we could dig into it. I can imagine we need to count some DDL commands in aspect of maximum restore transaction size in a different way than others. Also, we probably need to change the default restore transaction size. ------ Regards, Alexander Korotkov Supabase ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-26 20:05 Tom Lane <[email protected]> parent: Alexander Korotkov <[email protected]> 1 sibling, 0 replies; 15+ messages in thread From: Tom Lane @ 2024-07-26 20:05 UTC (permalink / raw) To: Alexander Korotkov <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers Alexander Korotkov <[email protected]> writes: > On Wed, Jul 24, 2024 at 5:18 PM Justin Pryzby <[email protected]> wrote: >> We'd surely prefer that the transaction size be configurable. > I think we can add an option to pg_upgrade. But I wonder if there is > something else we can do. Yeah, I'm not enamored of adding a command-line option, if only because I think a lot of people invoke pg_upgrade through vendor-provided scripts that aren't going to cooperate with that. If we can find some way to make it adapt without help, that would be much better. regards, tom lane ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-26 20:36 Justin Pryzby <[email protected]> parent: Alexander Korotkov <[email protected]> 1 sibling, 2 replies; 15+ messages in thread From: Justin Pryzby @ 2024-07-26 20:36 UTC (permalink / raw) To: Alexander Korotkov <[email protected]>; +Cc: Tom Lane <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers On Wed, Jul 24, 2024 at 09:17:51AM -0500, Justin Pryzby wrote: > With partitioning, we have a lot of tables, some of them wide (126 > partitioned tables, 8942 childs, total 1019315 columns). On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote: > It would be nice to identify such cases and check which memory contexts are > growing and why. I reproduced the problem with this schema: SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM generate_series(1,999)i); SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', i,i,i+1) FROM generate_series(1,999)i; This used over 4 GB of RAM. 3114201 pryzbyj 20 0 5924520 4.2g 32476 T 0.0 53.8 0:27.35 postgres: pryzbyj postgres [local] UPDATE The large context is: 2024-07-26 15:22:19.280 CDT [3114201] LOG: level: 1; CacheMemoryContext: 5211209088 total in 50067 blocks; 420688 free (14 chunks); 5210788400 used Note that there seemed to be no issue when I created 999 tables without partitioning: SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM generate_series(1,999)i; -- Justin ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-26 21:42 Alexander Korotkov <[email protected]> parent: Justin Pryzby <[email protected]> 1 sibling, 0 replies; 15+ messages in thread From: Alexander Korotkov @ 2024-07-26 21:42 UTC (permalink / raw) To: Justin Pryzby <[email protected]>; +Cc: Tom Lane <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers On Fri, Jul 26, 2024 at 11:36 PM Justin Pryzby <[email protected]> wrote: > On Wed, Jul 24, 2024 at 09:17:51AM -0500, Justin Pryzby wrote: > > With partitioning, we have a lot of tables, some of them wide (126 > > partitioned tables, 8942 childs, total 1019315 columns). > > On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote: > > It would be nice to identify such cases and check which memory contexts are > > growing and why. > > I reproduced the problem with this schema: > > SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM generate_series(1,999)i); > SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', i,i,i+1) FROM generate_series(1,999)i; > > This used over 4 GB of RAM. > 3114201 pryzbyj 20 0 5924520 4.2g 32476 T 0.0 53.8 0:27.35 postgres: pryzbyj postgres [local] UPDATE > > The large context is: > 2024-07-26 15:22:19.280 CDT [3114201] LOG: level: 1; CacheMemoryContext: 5211209088 total in 50067 blocks; 420688 free (14 chunks); 5210788400 used > > Note that there seemed to be no issue when I created 999 tables without > partitioning: > > SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM generate_series(1,999)i; Thank you! That was quick. I'm looking into this. ------ Regards, Alexander Korotkov Supabase ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-26 22:37 Tom Lane <[email protected]> parent: Justin Pryzby <[email protected]> 1 sibling, 1 reply; 15+ messages in thread From: Tom Lane @ 2024-07-26 22:37 UTC (permalink / raw) To: Justin Pryzby <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers Justin Pryzby <[email protected]> writes: > On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote: >> It would be nice to identify such cases and check which memory contexts are >> growing and why. > I reproduced the problem with this schema: > SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM generate_series(1,999)i); > SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', i,i,i+1) FROM generate_series(1,999)i; > This used over 4 GB of RAM. Interesting. This doesn't bloat particularly much in a regular pg_restore, even with --transaction-size=1000; but it does in pg_upgrade, as you say. I found that the bloat was occurring during these long sequences of UPDATE commands issued by pg_upgrade: -- For binary upgrade, recreate inherited column. UPDATE pg_catalog.pg_attribute SET attislocal = false WHERE attname = 'i' AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass; -- For binary upgrade, recreate inherited column. UPDATE pg_catalog.pg_attribute SET attislocal = false WHERE attname = 'i1' AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass; -- For binary upgrade, recreate inherited column. UPDATE pg_catalog.pg_attribute SET attislocal = false WHERE attname = 'i2' AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass; I think the problem is basically that each one of these commands causes a relcache inval, for which we can't reclaim space right away, so that we end up consuming O(N^2) cache space for an N-column inherited table. It's fairly easy to fix things so that this example doesn't cause that to happen: we just need to issue these updates as one command not N commands per table. See attached. However, I fear this should just be considered a draft, because the other code for binary upgrade in the immediate vicinity is just as aggressively stupid and unoptimized as this bit, and can probably also be driven to O(N^2) behavior with enough CHECK constraints etc. We've gone out of our way to make ALTER TABLE capable of handling many updates to a table's DDL in one command, but whoever wrote this code appears not to have read that memo, or at least to have believed that performance of pg_upgrade isn't of concern. > Note that there seemed to be no issue when I created 999 tables without > partitioning: > SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM generate_series(1,999)i; Yeah, because then we don't need to play games with attislocal. regards, tom lane Attachments: [text/x-diff] reduce-pg_upgrade-cache-bloat-wip.patch (2.2K, 2-reduce-pg_upgrade-cache-bloat-wip.patch) download | inline diff: diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index b8b1888bd3..19f98bdf43 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -16094,6 +16094,12 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) tbinfo->relkind == RELKIND_FOREIGN_TABLE || tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) { + bool firstinhcol = true; + + /* + * Drop any dropped columns. We don't really expect there to be a + * lot, else this code would be pretty inefficient. + */ for (j = 0; j < tbinfo->numatts; j++) { if (tbinfo->attisdropped[j]) @@ -16120,18 +16126,37 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) appendPQExpBuffer(q, "DROP COLUMN %s;\n", fmtId(tbinfo->attnames[j])); } - else if (!tbinfo->attislocal[j]) + } + + /* + * Fix up inherited columns. There could be a lot of these, so we + * do the operation in a single SQL command; otherwise, we risk + * O(N^2) relcache bloat thanks to repeatedly invalidating the + * table's relcache entry. + */ + for (j = 0; j < tbinfo->numatts; j++) + { + if (!tbinfo->attisdropped[j] && + !tbinfo->attislocal[j]) { - appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited column.\n"); - appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n" - "SET attislocal = false\n" - "WHERE attname = "); + if (firstinhcol) + { + appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited columns.\n"); + appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n" + "SET attislocal = false\n" + "WHERE attrelid = "); + appendStringLiteralAH(q, qualrelname, fout); + appendPQExpBufferStr(q, "::pg_catalog.regclass\n" + " AND attname IN ("); + firstinhcol = false; + } + else + appendPQExpBufferStr(q, ", "); appendStringLiteralAH(q, tbinfo->attnames[j], fout); - appendPQExpBufferStr(q, "\n AND attrelid = "); - appendStringLiteralAH(q, qualrelname, fout); - appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); } } + if (!firstinhcol) + appendPQExpBufferStr(q, ");\n"); /* * Add inherited CHECK constraints, if any. ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-26 22:55 Alexander Korotkov <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 15+ messages in thread From: Alexander Korotkov @ 2024-07-26 22:55 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers On Sat, Jul 27, 2024 at 1:37 AM Tom Lane <[email protected]> wrote: > Justin Pryzby <[email protected]> writes: > > On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote: > >> It would be nice to identify such cases and check which memory contexts are > >> growing and why. > > > I reproduced the problem with this schema: > > > SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM generate_series(1,999)i); > > SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', i,i,i+1) FROM generate_series(1,999)i; > > > This used over 4 GB of RAM. > > Interesting. This doesn't bloat particularly much in a regular > pg_restore, even with --transaction-size=1000; but it does in > pg_upgrade, as you say. I found that the bloat was occurring > during these long sequences of UPDATE commands issued by pg_upgrade: > > -- For binary upgrade, recreate inherited column. > UPDATE pg_catalog.pg_attribute > SET attislocal = false > WHERE attname = 'i' > AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass; > > -- For binary upgrade, recreate inherited column. > UPDATE pg_catalog.pg_attribute > SET attislocal = false > WHERE attname = 'i1' > AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass; > > -- For binary upgrade, recreate inherited column. > UPDATE pg_catalog.pg_attribute > SET attislocal = false > WHERE attname = 'i2' > AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass; > > I think the problem is basically that each one of these commands > causes a relcache inval, for which we can't reclaim space right > away, so that we end up consuming O(N^2) cache space for an > N-column inherited table. I was about to report the same. > It's fairly easy to fix things so that this example doesn't cause > that to happen: we just need to issue these updates as one command > not N commands per table. See attached. However, I fear this should > just be considered a draft, because the other code for binary upgrade > in the immediate vicinity is just as aggressively stupid and > unoptimized as this bit, and can probably also be driven to O(N^2) > behavior with enough CHECK constraints etc. We've gone out of our way > to make ALTER TABLE capable of handling many updates to a table's DDL > in one command, but whoever wrote this code appears not to have read > that memo, or at least to have believed that performance of pg_upgrade > isn't of concern. I was thinking about counting actual number of queries, not TOC entries for transaction number as a more universal solution. But that would require usage of psql_scan() or writing simpler alternative for this particular purpose. That looks quite annoying. What do you think? ------ Regards, Alexander Korotkov Supabase ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-26 23:06 Tom Lane <[email protected]> parent: Alexander Korotkov <[email protected]> 0 siblings, 1 reply; 15+ messages in thread From: Tom Lane @ 2024-07-26 23:06 UTC (permalink / raw) To: Alexander Korotkov <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers Alexander Korotkov <[email protected]> writes: > On Sat, Jul 27, 2024 at 1:37 AM Tom Lane <[email protected]> wrote: >> It's fairly easy to fix things so that this example doesn't cause >> that to happen: we just need to issue these updates as one command >> not N commands per table. > I was thinking about counting actual number of queries, not TOC > entries for transaction number as a more universal solution. But that > would require usage of psql_scan() or writing simpler alternative for > this particular purpose. That looks quite annoying. What do you > think? The assumption underlying what we're doing now is that the number of SQL commands per TOC entry is limited. I'd prefer to fix the code so that that assumption is correct, at least in normal cases. I confess I'd not looked closely enough at the binary-upgrade support code to realize it wasn't correct already :-(. If we go that way, we can fix this while also making pg_upgrade faster rather than slower. I also expect that it'll be a lot simpler than putting a full SQL parser in pg_restore. regards, tom lane ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-27 03:00 Alexander Korotkov <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 15+ messages in thread From: Alexander Korotkov @ 2024-07-27 03:00 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers On Sat, Jul 27, 2024 at 2:06 AM Tom Lane <[email protected]> wrote: > Alexander Korotkov <[email protected]> writes: > > On Sat, Jul 27, 2024 at 1:37 AM Tom Lane <[email protected]> wrote: > >> It's fairly easy to fix things so that this example doesn't cause > >> that to happen: we just need to issue these updates as one command > >> not N commands per table. > > > I was thinking about counting actual number of queries, not TOC > > entries for transaction number as a more universal solution. But that > > would require usage of psql_scan() or writing simpler alternative for > > this particular purpose. That looks quite annoying. What do you > > think? > > The assumption underlying what we're doing now is that the number > of SQL commands per TOC entry is limited. I'd prefer to fix the > code so that that assumption is correct, at least in normal cases. > I confess I'd not looked closely enough at the binary-upgrade support > code to realize it wasn't correct already :-(. If we go that way, > we can fix this while also making pg_upgrade faster rather than > slower. I also expect that it'll be a lot simpler than putting > a full SQL parser in pg_restore. I'm good with that as soon as we're not going to meet many cases of high number SQL commands per TOC entry. J4F, I have an idea to count number of ';' sings and use it for transaction size counter, since it is as upper bound estimate of number of SQL commands :-) ------ Regards, Alexander Korotkov Supabase ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-27 03:08 Tom Lane <[email protected]> parent: Alexander Korotkov <[email protected]> 0 siblings, 1 reply; 15+ messages in thread From: Tom Lane @ 2024-07-27 03:08 UTC (permalink / raw) To: Alexander Korotkov <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers Alexander Korotkov <[email protected]> writes: > J4F, I have an idea to count number of ';' sings and use it for > transaction size counter, since it is as upper bound estimate of > number of SQL commands :-) Hmm ... that's not a completely silly idea. Let's keep it in the back pocket in case we can't easily reduce the number of SQL commands in some cases. It's late here, and I've got some other commitments tomorrow, but I'll try to produce a patch to merge more of the SQL commands in a day or two. regards, tom lane ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-28 21:24 Tom Lane <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 15+ messages in thread From: Tom Lane @ 2024-07-28 21:24 UTC (permalink / raw) To: Alexander Korotkov <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers I wrote: > Alexander Korotkov <[email protected]> writes: >> J4F, I have an idea to count number of ';' sings and use it for >> transaction size counter, since it is as upper bound estimate of >> number of SQL commands :-) > Hmm ... that's not a completely silly idea. Let's keep it in > the back pocket in case we can't easily reduce the number of > SQL commands in some cases. After poking at this for awhile, we can fix Justin's example case by avoiding repeated UPDATEs on pg_attribute, so I think we should do that. It seems clearly a win, with no downside other than a small increment of complexity in pg_dump. However, that's probably not sufficient to mark this issue as closed. It seems likely that there are other patterns that would cause backend memory bloat. One case that I found is tables with a lot of inherited constraints (not partitions, but old-style inheritance). For example, load the output of this Perl script into a database: ----- for (my $i = 0; $i < 100; $i++) { print "CREATE TABLE test_inh_check$i (\n"; for (my $j = 0; $j < 1000; $j++) { print "a$j float check (a$j > 10.2),\n"; } print "b float);\n"; print "CREATE TABLE test_inh_check_child$i() INHERITS(test_inh_check$i);\n"; } ----- pg_dump is horrendously slow on this, thanks to O(N^2) behavior in ruleutils.c, and pg_upgrade is worse --- and leaks memory too in HEAD/v17. The slowness was there before, so I think the lack of field complaints indicates that this isn't a real-world use case. Still, it's bad if pg_upgrade fails when it would not have before, and there may be other similar issues. So I'm forced to the conclusion that we'd better make the transaction size adaptive as per Alexander's suggestion. In addition to the patches attached, I experimented with making dumpTableSchema fold all the ALTER TABLE commands for a single table into one command. That's do-able without too much effort, but I'm now convinced that we shouldn't. It would break the semicolon-counting hack for detecting that tables like these involve extra work. I'm also not very confident that the backend won't have trouble with ALTER TABLE commands containing hundreds of subcommands. That's something we ought to work on probably, but it's not a project that I want to condition v17 pg_upgrade's stability on. Anyway, proposed patches attached. 0001 is some trivial cleanup that I noticed while working on the failed single-ALTER-TABLE idea. 0002 merges the catalog-UPDATE commands that dumpTableSchema issues, and 0003 is Alexander's suggestion. regards, tom lane Attachments: [text/x-diff] v1-0001-Fix-missing-ONLY-in-one-dumpTableSchema-command.patch (1.8K, 2-v1-0001-Fix-missing-ONLY-in-one-dumpTableSchema-command.patch) download | inline diff: From 34ebed72e9029f690e5d3f0cb7464670e83caa5c Mon Sep 17 00:00:00 2001 From: Tom Lane <[email protected]> Date: Sun, 28 Jul 2024 13:02:27 -0400 Subject: [PATCH v1 1/3] Fix missing ONLY in one dumpTableSchema command. The various ALTER [FOREIGN] TABLE commands issued by dumpTableSchema all use ONLY, except for this one. I think it's not a live bug because we don't permit foreign tables to have children, but it's still inconsistent. I happened across this while refactoring dumpTableSchema to merge all its ALTERs into one; while I'm not certain we should actually do that, this seems like good cleanup. --- src/bin/pg_dump/pg_dump.c | 2 +- src/bin/pg_dump/t/002_pg_dump.pl | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index b8b1888bd3..7cd6a7fb97 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -16344,7 +16344,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) if (tbinfo->relkind == RELKIND_FOREIGN_TABLE && tbinfo->attfdwoptions[j][0] != '\0') appendPQExpBuffer(q, - "ALTER FOREIGN TABLE %s ALTER COLUMN %s OPTIONS (\n" + "ALTER FOREIGN TABLE ONLY %s ALTER COLUMN %s OPTIONS (\n" " %s\n" ");\n", qualrelname, diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl index d3dd8784d6..5bcc2244d5 100644 --- a/src/bin/pg_dump/t/002_pg_dump.pl +++ b/src/bin/pg_dump/t/002_pg_dump.pl @@ -1154,7 +1154,7 @@ my %tests = ( 'ALTER FOREIGN TABLE foreign_table ALTER COLUMN c1 OPTIONS' => { regexp => qr/^ - \QALTER FOREIGN TABLE dump_test.foreign_table ALTER COLUMN c1 OPTIONS (\E\n + \QALTER FOREIGN TABLE ONLY dump_test.foreign_table ALTER COLUMN c1 OPTIONS (\E\n \s+\Qcolumn_name 'col1'\E\n \Q);\E\n /xm, -- 2.43.5 [text/x-diff] v1-0002-Reduce-number-of-commands-dumpTableSchema-emits-f.patch (8.5K, 3-v1-0002-Reduce-number-of-commands-dumpTableSchema-emits-f.patch) download | inline diff: From c8f0d0252e292f276fe9631ae31e6aea11d294d2 Mon Sep 17 00:00:00 2001 From: Tom Lane <[email protected]> Date: Sun, 28 Jul 2024 16:19:48 -0400 Subject: [PATCH v1 2/3] Reduce number of commands dumpTableSchema emits for binary upgrade. Avoid issuing a separate SQL UPDATE command for each column when directly manipulating pg_attribute contents in binary upgrade mode. With the separate updates, we triggered a relcache invalidation with each update. For a table with N columns, that causes O(N^2) relcache bloat in txn_size mode because the table's newly-created relcache entry can't be flushed till end of transaction. Reducing the number of commands is marginally faster as well as avoiding that problem. While at it, likewise avoid issuing a separate UPDATE on pg_constraint for each inherited constraint. This is less exciting, first because inherited (non-partitioned) constraints are relatively rare, and second because the backend has a good deal of trouble anyway with restoring tables containing many such constraints, due to MergeConstraintsIntoExisting being horribly inefficient. But it seems more consistent to do it this way here too, and it surely can't hurt. Per report from Justin Pryzby. Back-patch to v17 where txn_size mode was introduced. Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023 --- src/bin/pg_dump/pg_dump.c | 132 ++++++++++++++++++++++++++++---------- 1 file changed, 97 insertions(+), 35 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 7cd6a7fb97..2b02148559 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -15670,6 +15670,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) DumpOptions *dopt = fout->dopt; PQExpBuffer q = createPQExpBuffer(); PQExpBuffer delq = createPQExpBuffer(); + PQExpBuffer extra = createPQExpBuffer(); char *qrelname; char *qualrelname; int numParents; @@ -15736,7 +15737,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) char *partkeydef = NULL; char *ftoptions = NULL; char *srvname = NULL; - char *foreign = ""; + const char *foreign = ""; /* * Set reltypename, and collect any relkind-specific data that we @@ -16094,51 +16095,98 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) tbinfo->relkind == RELKIND_FOREIGN_TABLE || tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) { + bool firstitem; + + /* + * Drop any dropped columns. Merge the pg_attribute manipulations + * into a single SQL command, so that we don't cause repeated + * relcache flushes on the target table. Otherwise we risk O(N^2) + * relcache bloat while dropping N columns. + */ + resetPQExpBuffer(extra); + firstitem = true; for (j = 0; j < tbinfo->numatts; j++) { if (tbinfo->attisdropped[j]) { - appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate dropped column.\n"); - appendPQExpBuffer(q, "UPDATE pg_catalog.pg_attribute\n" - "SET attlen = %d, " - "attalign = '%c', attbyval = false\n" - "WHERE attname = ", + if (firstitem) + { + appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate dropped columns.\n" + "UPDATE pg_catalog.pg_attribute\n" + "SET attlen = v.dlen, " + "attalign = v.dalign, " + "attbyval = false\n" + "FROM (VALUES "); + firstitem = false; + } + else + appendPQExpBufferStr(q, ",\n "); + appendPQExpBufferChar(q, '('); + appendStringLiteralAH(q, tbinfo->attnames[j], fout); + appendPQExpBuffer(q, ", %d, '%c')", tbinfo->attlen[j], tbinfo->attalign[j]); - appendStringLiteralAH(q, tbinfo->attnames[j], fout); - appendPQExpBufferStr(q, "\n AND attrelid = "); - appendStringLiteralAH(q, qualrelname, fout); - appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); - - if (tbinfo->relkind == RELKIND_RELATION || - tbinfo->relkind == RELKIND_PARTITIONED_TABLE) - appendPQExpBuffer(q, "ALTER TABLE ONLY %s ", - qualrelname); - else - appendPQExpBuffer(q, "ALTER FOREIGN TABLE ONLY %s ", - qualrelname); - appendPQExpBuffer(q, "DROP COLUMN %s;\n", + /* The ALTER ... DROP COLUMN commands must come after */ + appendPQExpBuffer(extra, "ALTER %sTABLE ONLY %s ", + foreign, qualrelname); + appendPQExpBuffer(extra, "DROP COLUMN %s;\n", fmtId(tbinfo->attnames[j])); } - else if (!tbinfo->attislocal[j]) + } + if (!firstitem) + { + appendPQExpBufferStr(q, ") v(dname, dlen, dalign)\n" + "WHERE attrelid = "); + appendStringLiteralAH(q, qualrelname, fout); + appendPQExpBufferStr(q, "::pg_catalog.regclass\n" + " AND attname = v.dname;\n"); + /* Now we can issue the actual DROP COLUMN commands */ + appendBinaryPQExpBuffer(q, extra->data, extra->len); + } + + /* + * Fix up inherited columns. As above, do the pg_attribute + * manipulations in a single SQL command. + */ + firstitem = true; + for (j = 0; j < tbinfo->numatts; j++) + { + if (!tbinfo->attisdropped[j] && + !tbinfo->attislocal[j]) { - appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited column.\n"); - appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n" - "SET attislocal = false\n" - "WHERE attname = "); + if (firstitem) + { + appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited columns.\n"); + appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n" + "SET attislocal = false\n" + "WHERE attrelid = "); + appendStringLiteralAH(q, qualrelname, fout); + appendPQExpBufferStr(q, "::pg_catalog.regclass\n" + " AND attname IN ("); + firstitem = false; + } + else + appendPQExpBufferStr(q, ", "); appendStringLiteralAH(q, tbinfo->attnames[j], fout); - appendPQExpBufferStr(q, "\n AND attrelid = "); - appendStringLiteralAH(q, qualrelname, fout); - appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); } } + if (!firstitem) + appendPQExpBufferStr(q, ");\n"); /* * Add inherited CHECK constraints, if any. * * For partitions, they were already dumped, and conislocal * doesn't need fixing. + * + * As above, issue only one direct manipulation of pg_constraint. + * Although it is tempting to merge the ALTER ADD CONSTRAINT + * commands into one as well, refrain for now due to concern about + * possible backend memory bloat if there are many such + * constraints. */ + resetPQExpBuffer(extra); + firstitem = true; for (k = 0; k < tbinfo->ncheck; k++) { ConstraintInfo *constr = &(tbinfo->checkexprs[k]); @@ -16146,18 +16194,31 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) if (constr->separate || constr->conislocal || tbinfo->ispartition) continue; - appendPQExpBufferStr(q, "\n-- For binary upgrade, set up inherited constraint.\n"); + if (firstitem) + appendPQExpBufferStr(q, "\n-- For binary upgrade, set up inherited constraints.\n"); appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ADD CONSTRAINT %s %s;\n", foreign, qualrelname, fmtId(constr->dobj.name), constr->condef); - appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_constraint\n" - "SET conislocal = false\n" - "WHERE contype = 'c' AND conname = "); - appendStringLiteralAH(q, constr->dobj.name, fout); - appendPQExpBufferStr(q, "\n AND conrelid = "); - appendStringLiteralAH(q, qualrelname, fout); - appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); + /* Update pg_constraint after all the ALTER TABLEs */ + if (firstitem) + { + appendPQExpBufferStr(extra, "UPDATE pg_catalog.pg_constraint\n" + "SET conislocal = false\n" + "WHERE contype = 'c' AND conrelid = "); + appendStringLiteralAH(extra, qualrelname, fout); + appendPQExpBufferStr(extra, "::pg_catalog.regclass\n"); + appendPQExpBufferStr(extra, " AND conname IN ("); + firstitem = false; + } + else + appendPQExpBufferStr(extra, ", "); + appendStringLiteralAH(extra, constr->dobj.name, fout); + } + if (!firstitem) + { + appendPQExpBufferStr(extra, ");\n"); + appendBinaryPQExpBuffer(q, extra->data, extra->len); } if (numParents > 0 && !tbinfo->ispartition) @@ -16445,6 +16506,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) destroyPQExpBuffer(q); destroyPQExpBuffer(delq); + destroyPQExpBuffer(extra); free(qrelname); free(qualrelname); } -- 2.43.5 [text/x-diff] v1-0003-Count-individual-SQL-commands-in-pg_restore-s-tra.patch (2.8K, 4-v1-0003-Count-individual-SQL-commands-in-pg_restore-s-tra.patch) download | inline diff: From 7cfea69d3f5df4c15681f4902a86b366f0ed4292 Mon Sep 17 00:00:00 2001 From: Tom Lane <[email protected]> Date: Sun, 28 Jul 2024 16:40:35 -0400 Subject: [PATCH v1 3/3] Count individual SQL commands in pg_restore's --transaction-size mode. The initial implementation counted one action per TOC entry (except for some special cases for multi-blob BLOBS entries). This assumes that TOC entries are all about equally complex, but it turns out that that assumption doesn't hold up very well in binary-upgrade mode. For example, even after the previous patch I was able to cause backend bloat with tables having many inherited constraints. To fix, count multi-command TOC entries as N actions, allowing the transaction size to be scaled down when we hit a complex TOC entry. Rather than add a SQL parser to pg_restore, approximate "multi command" by counting semicolons in the TOC entry's defn string. This will be fooled by semicolons appearing in string literals --- but the error is in the conservative direction, so it doesn't seem worth working harder. The biggest risk is with function/procedure TOC entries, but we can just explicitly skip those. (This is undoubtedly a hack, and maybe someday we'll be able to revert it after fixing the backend's bloat issues or rethinking what pg_dump emits in binary upgrade mode. But that surely isn't a project for v17.) Thanks to Alexander Korotkov for the let's-count-semicolons idea. Per report from Justin Pryzby. Back-patch to v17 where txn_size mode was introduced. Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023 --- src/bin/pg_dump/pg_backup_archiver.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 68e321212d..8c20c263c4 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -3827,10 +3827,32 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData) { IssueACLPerBlob(AH, te); } - else + else if (te->defn && strlen(te->defn) > 0) { - if (te->defn && strlen(te->defn) > 0) - ahprintf(AH, "%s\n\n", te->defn); + ahprintf(AH, "%s\n\n", te->defn); + + /* + * If the defn string contains multiple SQL commands, txn_size mode + * should count it as N actions not one. But rather than build a full + * SQL parser, approximate this by counting semicolons. One case + * where that tends to be badly fooled is function definitions, so + * ignore them. (restore_toc_entry will count one action anyway.) + */ + if (ropt->txn_size > 0 && + strcmp(te->desc, "FUNCTION") != 0 && + strcmp(te->desc, "PROCEDURE") != 0) + { + const char *p = te->defn; + int nsemis = 0; + + while ((p = strchr(p, ';')) != NULL) + { + nsemis++; + p++; + } + if (nsemis > 1) + AH->txnCount += nsemis - 1; + } } /* -- 2.43.5 ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2024-07-31 13:39 Alexander Korotkov <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 15+ messages in thread From: Alexander Korotkov @ 2024-07-31 13:39 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers On Mon, Jul 29, 2024 at 12:24 AM Tom Lane <[email protected]> wrote: > So I'm forced to the conclusion that we'd better make the transaction > size adaptive as per Alexander's suggestion. > > In addition to the patches attached, I experimented with making > dumpTableSchema fold all the ALTER TABLE commands for a single table > into one command. That's do-able without too much effort, but I'm now > convinced that we shouldn't. It would break the semicolon-counting > hack for detecting that tables like these involve extra work. > I'm also not very confident that the backend won't have trouble with > ALTER TABLE commands containing hundreds of subcommands. That's > something we ought to work on probably, but it's not a project that > I want to condition v17 pg_upgrade's stability on. > > Anyway, proposed patches attached. 0001 is some trivial cleanup > that I noticed while working on the failed single-ALTER-TABLE idea. > 0002 merges the catalog-UPDATE commands that dumpTableSchema issues, > and 0003 is Alexander's suggestion. Nice to see you picked up my idea. I took a look over the patchset. Looks good to me. ------ Regards, Alexander Korotkov Supabase ^ permalink raw reply [nested|flat] 15+ messages in thread
* Re: pg_upgrade failing for 200+ million Large Objects @ 2026-03-23 19:47 PP L <[email protected]> parent: Alexander Korotkov <[email protected]> 0 siblings, 0 replies; 15+ messages in thread From: PP L @ 2026-03-23 19:47 UTC (permalink / raw) To: Alexander Korotkov <[email protected]>; Tom Lane <[email protected]>; +Cc: Justin Pryzby <[email protected]>; Nathan Bossart <[email protected]>; Michael Banck <[email protected]>; Laurenz Albe <[email protected]>; vignesh C <[email protected]>; Kumar, Sachin <[email protected]>; Robins Tharakan <[email protected]>; Jan Wieck <[email protected]>; Bruce Momjian <[email protected]>; Andrew Dunstan <[email protected]>; Magnus Hagander <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers Hello hackers, I wanted to revive this thread specifically around the attislocal optimization discussion. As part of https://github.com/postgres/postgres/commit/b3f0e0503f3, we now batch all the attislocal UPDATEs together, hence making it more performant. I think we might be able to go one step further and completely skip the attislocal UPDATE for partition tables. This is because the attislocal UPDATE is done immediately after 'CREATE TABLE', during the 'ATTACH PARTITION' step(see attislocal being set to false in MergeAttributesIntoExisting). The UPDATEs emitted by pg_dump are therefore redundant. Even with batching, the single UPDATE still modifies N(no of columns) rows causing N relcache invalidations. This same workflow is then repeated by ATTACH PARTITION causing another N relcache invalidations. Skipping the attislocal UPDATE definitely speeds up the runtime if there are a lot of partition tables because we will avoid quite a lot of relcache invalidations and rebuild calls. Since this optimization removes the attislocal UPDATE completely, the effect will be even more pronounced for wider partition tables. There's already precedent for this: * attinhcount is never explicitly set by pg_dump. It is only modified by MergeAttributesIntoExisting during ATTACH PARTITION * conislocal for CHECK constraints is explicitly not fixed for partitions. See comment "No need to fix conislocal: ATTACH PARTITION does that" in dumpTableSchema The only risk I can foresee is the window between CREATE TABLE and ATTACH PARTITION where attislocal will be incorrectly set to true. But I think this window is small enough to not worry about since ATTACH PARTITION immediately succeeds CREATE TABLE(maybe barring some other minor updates) Here is a simple patch ``` diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 137161aa5e0..d3d7403228a 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -17734,7 +17734,8 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) for (j = 0; j < tbinfo->numatts; j++) { if (!tbinfo->attisdropped[j] && - !tbinfo->attislocal[j]) + !tbinfo->attislocal[j] && + !tbinfo->ispartition) { if (firstitem) { ``` I tried a few experiments on my local macbook and noticed an improvement of around ~33-36% (% can vary mostly depending on the number of columns) Setup(master branch): 300 partitioned root tables with 200 leaves each = 60000 partition tables 300 columns per partition: baseline : ~30 mins with patch : ~20 mins (~33% faster) 700 columns per partition: baseline : ~66 mins with patch : ~42 mins (~36% faster) Thanks Nikhil Broadcom Inc. On Mon, 23 Mar 2026 at 11:50, Alexander Korotkov <[email protected]> wrote: > On Mon, Jul 29, 2024 at 12:24 AM Tom Lane <[email protected]> wrote: > > So I'm forced to the conclusion that we'd better make the transaction > > size adaptive as per Alexander's suggestion. > > > > In addition to the patches attached, I experimented with making > > dumpTableSchema fold all the ALTER TABLE commands for a single table > > into one command. That's do-able without too much effort, but I'm now > > convinced that we shouldn't. It would break the semicolon-counting > > hack for detecting that tables like these involve extra work. > > I'm also not very confident that the backend won't have trouble with > > ALTER TABLE commands containing hundreds of subcommands. That's > > something we ought to work on probably, but it's not a project that > > I want to condition v17 pg_upgrade's stability on. > > > > Anyway, proposed patches attached. 0001 is some trivial cleanup > > that I noticed while working on the failed single-ALTER-TABLE idea. > > 0002 merges the catalog-UPDATE commands that dumpTableSchema issues, > > and 0003 is Alexander's suggestion. > > Nice to see you picked up my idea. I took a look over the patchset. > Looks good to me. > > ------ > Regards, > Alexander Korotkov > Supabase > > > > > ^ permalink raw reply [nested|flat] 15+ messages in thread
end of thread, other threads:[~2026-03-23 19:47 UTC | newest] Thread overview: 15+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-04-01 19:28 Re: pg_upgrade failing for 200+ million Large Objects Tom Lane <[email protected]> 2024-04-01 19:37 ` Nathan Bossart <[email protected]> 2024-07-24 14:17 ` Justin Pryzby <[email protected]> 2024-07-26 19:53 ` Alexander Korotkov <[email protected]> 2024-07-26 20:05 ` Tom Lane <[email protected]> 2024-07-26 20:36 ` Justin Pryzby <[email protected]> 2024-07-26 21:42 ` Alexander Korotkov <[email protected]> 2024-07-26 22:37 ` Tom Lane <[email protected]> 2024-07-26 22:55 ` Alexander Korotkov <[email protected]> 2024-07-26 23:06 ` Tom Lane <[email protected]> 2024-07-27 03:00 ` Alexander Korotkov <[email protected]> 2024-07-27 03:08 ` Tom Lane <[email protected]> 2024-07-28 21:24 ` Tom Lane <[email protected]> 2024-07-31 13:39 ` Alexander Korotkov <[email protected]> 2026-03-23 19:47 ` PP L <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox