public inbox for [email protected]
help / color / mirror / Atom feedFrom: Tharakan, Robins <[email protected]>
To: [email protected] <[email protected]>
Subject: RE: pg_upgrade failing for 200+ million Large Objects
Date: Sun, 7 Mar 2021 08:43:28 +0000
Message-ID: <[email protected]> (raw)
Hi all,
Attached is a proof-of-concept patch that allows Postgres to perform
pg_upgrade if the instance has Millions of objects.
It would be great if someone could take a look and see if this patch is in
the right direction. There are some pending tasks (such as documentation /
pg_resetxlog vs pg_resetwal related changes) but for now, the patch helps
remove a stalemate where if a Postgres instance has a large number
(accurately speaking 146+ Million) of Large Objects, pg_upgrade fails. This
is easily reproducible and besides deleting Large Objects before upgrade,
there is no other (apparent) way for pg_upgrade to complete.
The patch (attached):
- Applies cleanly on REL9_6_STABLE -
c7a4fc3dd001646d5938687ad59ab84545d5d043
- 'make check' passes
- Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.
Sample argument for pg_upgrade:
$ /opt/postgres/96/bin/pg_upgrade --max-limit-xid 1000000000 --old-bindir
...
With this patch, pg_upgrade is now able to upgrade a v9.5 cluster with 500
million Large Objects successfully to v9.6 - some stats below:
Source Postgres - v9.5.24
Target Version - v9.6.21
Large Object Count: 500 Million Large Objects
Machine - r5.4xlarge (16vCPU / 128GB RAM + 1TB swap)
Memory used during pg_upgrade - ~350GB
Time taken - 25+ hrs. (tested twice) - (All LOs processed sequentially ->
Scope for optimization)
Although counter-intuitive, for this testing purpose all Large Objects were
small (essentially the idea was to test the count) and created by using
something like this:
seq 1 50000 | xargs -n 1 -i -P 10 /opt/postgres/95/bin/psql -c "select
lo_from_bytea(0, '\xffffff00') from generate_series(1,10000);" > /dev/null
I am not married to the patch (especially the argument name) but ideally I'd
prefer a way to get this upgrade going without a patch. For now, I am unable
to find any other way to upgrade a v9.5 Postgres database in this scenario,
facing End-of-Life.
Reference:
1) 2 Billion constant -
https://github.com/postgres/postgres/blob/ca3b37487be333a1d241dab1bbdd17a211
a88f43/src/bin/pg_resetwal/pg_resetwal.c#L444
Thanks,
Robins Tharakan
> -----Original Message-----
> From: Tharakan, Robins
> Sent: Wednesday, 3 March 2021 10:36 PM
> To: [email protected]
> Subject: pg_upgrade failing for 200+ million Large Objects
>
> Hi,
>
> While reviewing a failed upgrade from Postgres v9.5 (to v9.6) I saw that
> the
> instance had ~200 million (in-use) Large Objects. I was able to reproduce
> this on a test instance which too fails with a similar error.
>
>
> pg_restore: executing BLOB 4980622
> pg_restore: WARNING: database with OID 0 must be vacuumed within 1000001
> transactions
> HINT: To avoid a database shutdown, execute a database-wide VACUUM in
> that
> database.
> You might also need to commit or roll back old prepared transactions.
> pg_restore: executing BLOB 4980623
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2565; 2613 4980623 BLOB
> 4980623 postgres
> pg_restore: [archiver (db)] could not execute query: ERROR: database is
> not
> accepting commands to avoid wraparound data loss in database with OID 0
> HINT: Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
> Command was: SELECT pg_catalog.lo_create('4980623');
>
>
>
> To remove the obvious possibilities, these Large Objects that are still
> in-use (so vacuumlo wouldn't help), giving more system resources doesn't
> help, moving Large Objects around to another database doesn't help (since
> this is cluster-wide restriction), the source instance is nowhere close
> to
> wraparound and lastly recent-most minor versions don't help either (I
> tried
> compiling 9_6_STABLE + upgrade database with 150 million LO and still
> encountered the same issue).
>
> Do let me know if I am missing something obvious but it appears that this
> is
> happening owing to 2 things coming together:
>
> * Each Large Object is migrated in its own transaction during pg_upgrade
> * pg_resetxlog appears to be narrowing the window (available for
> pg_upgrade)
> to ~146 Million XIDs (2^31 - 1 million XID wraparound margin - 2 billion
> which is a hard-coded constant - see [1] - in what appears to be an
> attempt
> to force an Autovacuum Wraparound session soon after upgrade completes).
>
> Ideally such an XID based restriction, is limiting for an instance that's
> actively using a lot of Large Objects. Besides forcing AutoVacuum
> Wraparound
> logic to kick in soon after, I am unclear what much else it aims to do.
> What
> it does seem to be doing is to block Major Version upgrades if the
> pre-upgrade instance has >146 Million Large Objects (half that, if the LO
> additionally requires ALTER LARGE OBJECT OWNER TO for each of those
> objects
> during pg_restore)
>
> For long-term these ideas came to mind, although am unsure which are
> low-hanging fruits and which outright impossible - For e.g. clubbing
> multiple objects in a transaction [2] / Force AutoVacuum post upgrade
> (and
> thus remove this limitation altogether) or see if "pg_resetxlog -x" (from
> within pg_upgrade) could help in some way to work-around this limitation.
>
> Is there a short-term recommendation for this scenario?
>
> I can understand a high number of small-sized objects is not a great way
> to
> use pg_largeobject (since Large Objects was intended to be for, well,
> 'large
> objects') but this magic number of Large Objects is now a stalemate at
> this
> point (with respect to v9.5 EOL).
>
>
> Reference:
> 1) pg_resetxlog -
> https://github.com/postgres/postgres/blob/ca3b37487be333a1d241dab1bbdd17a
> 211
> a88f43/src/bin/pg_resetwal/pg_resetwal.c#L444
> 2)
> https://www.postgresql.org/message-id/ed7d86a1-b907-4f53-9f6e-
> 63482d2f2bac%4
> 0manitou-mail.org
>
> -
> Thanks
> Robins Tharakan
Attachments:
[application/octet-stream] pgupgrade_lo_v2.patch (5.4K, 2-pgupgrade_lo_v2.patch)
download | inline diff:
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 3e79482ca2..f2e9824cb5 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -67,6 +67,7 @@ static TransactionId set_xid = 0;
static TransactionId set_oldest_commit_ts_xid = 0;
static TransactionId set_newest_commit_ts_xid = 0;
static Oid set_oid = 0;
+static Oid max_limit_xid = 2000000000;
static MultiXactId set_mxid = 0;
static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
static uint32 minXlogTli = 0;
@@ -116,7 +117,7 @@ main(int argc, char *argv[])
}
- while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:")) != -1)
+ while ((c = getopt(argc, argv, "c:D:e:fl:L:m:no:O:x:")) != -1)
{
switch (c)
{
@@ -210,6 +211,21 @@ main(int argc, char *argv[])
}
break;
+ case 'L':
+ max_limit_xid = strtoul(optarg, &endptr, 0);
+ if (endptr == optarg || *endptr != '\0')
+ {
+ fprintf(stderr, _("%s: invalid argument for option %s\n"), progname, "-L");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+ exit(1);
+ }
+ if (max_limit_xid <= 500000000)
+ {
+ fprintf(stderr, _("%s: Max Limit XID (-L) must not be less than 500 Million\n"), progname);
+ exit(1);
+ }
+ break;
+
case 'm':
set_mxid = strtoul(optarg, &endptr, 0);
if (endptr == optarg || *endptr != ',')
@@ -381,7 +397,7 @@ main(int argc, char *argv[])
* reasonably safe. The magic constant here corresponds to the
* maximum allowed value of autovacuum_freeze_max_age.
*/
- ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
+ ControlFile.checkPointCopy.oldestXid = set_xid - max_limit_xid;
if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
@@ -1239,6 +1255,7 @@ usage(void)
printf(_(" -e XIDEPOCH set next transaction ID epoch\n"));
printf(_(" -f force update to be done\n"));
printf(_(" -l XLOGFILE force minimum WAL starting location for new transaction log\n"));
+ printf(_(" -L MAXLIMITXID force max XID starting location for new transaction log\n"));
printf(_(" -m MXID,MXID set next and oldest multitransaction ID\n"));
printf(_(" -n no update, just show what would be done (for testing)\n"));
printf(_(" -o OID set next OID\n"));
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 7ab284a51b..3a861739f7 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -53,6 +53,7 @@ parseCommandLine(int argc, char *argv[])
{"link", no_argument, NULL, 'k'},
{"retain", no_argument, NULL, 'r'},
{"jobs", required_argument, NULL, 'j'},
+ {"max-limit-xid", required_argument, NULL, 'L'},
{"verbose", no_argument, NULL, 'v'},
{NULL, 0, NULL, 0}
};
@@ -64,6 +65,7 @@ parseCommandLine(int argc, char *argv[])
time_t run_time = time(NULL);
user_opts.transfer_mode = TRANSFER_MODE_COPY;
+ user_opts.maxlimitxid = 2000000000;
os_info.progname = get_progname(argv[0]);
@@ -101,7 +103,7 @@ parseCommandLine(int argc, char *argv[])
if ((log_opts.internal = fopen_priv(INTERNAL_LOG_FILE, "a")) == NULL)
pg_fatal("cannot write to log file %s\n", INTERNAL_LOG_FILE);
- while ((option = getopt_long(argc, argv, "d:D:b:B:cj:ko:O:p:P:rU:v",
+ while ((option = getopt_long(argc, argv, "d:D:b:B:cj:ko:L:O:p:P:rU:v",
long_options, &optindex)) != -1)
{
switch (option)
@@ -132,6 +134,10 @@ parseCommandLine(int argc, char *argv[])
user_opts.jobs = atoi(optarg);
break;
+ case 'L':
+ user_opts.maxlimitxid = atoi(optarg);
+ break;
+
case 'k':
user_opts.transfer_mode = TRANSFER_MODE_LINK;
break;
@@ -287,6 +293,7 @@ Options:\n\
-D, --new-datadir=DATADIR new cluster data directory\n\
-j, --jobs=NUM number of simultaneous processes or threads to use\n\
-k, --link link instead of copying files to new cluster\n\
+ -L, --max-limit-xid=NUM max-limit XIDs to consider\n\
-o, --old-options=OPTIONS old cluster options to pass to the server\n\
-O, --new-options=OPTIONS new cluster options to pass to the server\n\
-p, --old-port=PORT old cluster port number (default %d)\n\
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 02078c0357..2d0f3a7e04 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -411,8 +411,10 @@ copy_clog_xlog_xid(void)
/* set the next transaction id and epoch of the new cluster */
prep_status("Setting next transaction ID and epoch for new cluster");
exec_prog(UTILITY_LOG_FILE, NULL, true,
- "\"%s/pg_resetxlog\" -f -x %u \"%s\"",
- new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
+ "\"%s/pg_resetxlog\" -L %u -f -x %u \"%s\"",
+ new_cluster.bindir,
+ user_opts.maxlimitxid,
+ old_cluster.controldata.chkpnt_nxtxid,
new_cluster.pgdata);
exec_prog(UTILITY_LOG_FILE, NULL, true,
"\"%s/pg_resetxlog\" -f -e %u \"%s\"",
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 9fbdacc53e..50fe73ae09 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -298,6 +298,7 @@ typedef struct
* changes */
transferMode transfer_mode; /* copy files or link them? */
int jobs;
+ int maxlimitxid;
} UserOpts;
[application/pkcs7-signature] smime.p7s (5.0K, 3-smime.p7s)
download
view thread (49+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected]
Subject: RE: pg_upgrade failing for 200+ million Large Objects
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox