public inbox for [email protected]
help / color / mirror / Atom feedFrom: Maxim Orlov <[email protected]>
To: Heikki Linnakangas <[email protected]>
Cc: wenhui qiu <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: make mxidoff 64 bits
Date: Tue, 19 Nov 2024 20:53:49 +0300
Message-ID: <CACG=ezajc_Pcqmy6fcq-N8+LzCRMzOzJzez2_BgHEu-6RVJtKQ@mail.gmail.com> (raw)
In-Reply-To: <CACG=ezZGQFBb0yepka8hU2BmJ48ujt3xa+aYLNL0BQPx0vqwZg@mail.gmail.com>
References: <CACG=ezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com>
<CACG=ezYokoiumOFnqUfg_ffHD5s8T+6iHYfzKLfa=QQ-1pNrBg@mail.gmail.com>
<CACG=ezY9xq73jcX_EjVqx5-f90nbQ9PyhFCTW2fwFCS2wmNiFw@mail.gmail.com>
<CACG=eza+27CfLBobJJccRhXrA3He6c1irAnoyTtSC1-z9UXLrg@mail.gmail.com>
<CAPpHfduczcop9s6gKUpLGgFUe2y4ERGMJx6SS6Kp+s-kQPwMjg@mail.gmail.com>
<CACG=ezbye4g_ERNqE=gBcvQ0YypRaVENhNUu8xrs4PL12UdnUA@mail.gmail.com>
<CACG=ezaMncd0-BcGHBgsSR2eqHfrz9WznHGLKX8biz6zu-azGw@mail.gmail.com>
<[email protected]>
<CACG=ezb9XTvd3ZmS0y8gUunx_wBBdJO7ou+BfCOnnA5jE-11vg@mail.gmail.com>
<CACG=ezYFNqGjsxF6Vb2CHF6JzKcjhAFauaFm9js0nu_3Ngcdkw@mail.gmail.com>
<CAGjGUA+dcV7veaCV1H65vCNsbS++nT8=ho772gDvsXUW9H7eXQ@mail.gmail.com>
<CACG=ezYThNkf8QsDA-aQfEFEkqn2L=_uUL83z0vJstPRasbZqg@mail.gmail.com>
<[email protected]>
<CACG=ezYtCatcRODS-ZkwhcxuqBKCuhEsZGBruw=dGCLoepF+ZA@mail.gmail.com>
<[email protected]>
<CACG=ezb680eb=JXh1ns=t5eGH3h9y-uTfT4tf3Xc8t2UH2q6tQ@mail.gmail.com>
<CACG=ezZGQFBb0yepka8hU2BmJ48ujt3xa+aYLNL0BQPx0vqwZg@mail.gmail.com>
Oops! Sorry for the noise. I've must have been overworking yesterday and
messed up the working branches. v7 was a correct set and v8 don't. Here is
the correction with extended Perl test.
The test itself is in src/bin/pg_upgrade/t/005_offset.pl It is rather heavy
and took about 45 minutes on my i5 with 2.7 Gb data generated. Basically,
each test here is creating a cluster and fill it with multixacts. Thus,
dozens of segments are created using two methods. One is with prepared
transactions, and it creates, roughly, the same amount of segments for
members and for offsets. The other one is based on Heikki's multixids.py
and creates more members than offsets. I've used both of these methods to
generate as much diverse data as possible.
Here is how I test this patch set:
1. You need two pg clusters: the "old" one, i.e. without patch set, and
the "new" with patch set v9 applied.
2. Apply v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt
to the "old" and "new" clusters. Note, this is only patch required for
"old" cluster. This will allow you to create a cluster with non-standard
initial multixact and multixact offset. Unfortunately, this patch was not
did not arouse public interest since it is assumed that there is similar
functionality to the pg_resetwal utility. But similar is not mean equal.
See, pg_resetwal must be used after cluster init, thus, we step into some
problems with vacuum and some SLRU segments must be filled with zeroes.
Also, template0 datminmxid must be manually updated. So, in me view,
using this patch is justified and very handy here.
3. Also, apply all the "TEST" (0006 and 0007) patches to the "new"
cluster.
4. Build "old" and "new" pg clusters.
5. Run the test with: PROVE_TESTS=t/005_offset.pl PG_TEST_NOCLEAN=1
oldinstall=/home/orlov/proj/OFFSET3/pgsql-old make check -s -C
src/bin/pg_upgrade/
6. In my case, it took around 45 minutes and generate roughly 2.7 Gb of
data.
"TEST" patches, of course, are for the test purposes and not to be
committed.
In src/bin/pg_upgrade/t/005_offset.pl I try to consider next cases:
- Basic sanity checks.
Here I test various initial multi and offset values (including
wraparound) and see how appropriate segments are generated.
- pg_upgarde tests.
Here is oldinstall ENV is for. Run pg_upgrade for old cluster with multi
and offset values just like in previous step. i.e. with various
combinations.
- Self pg_upgarde.
--
Best regards,
Maxim Orlov.
From 2642f597832cbed0ebc54202de4e0f5770ac5f50 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v9 5/7] TEST: initdb option to initialize cluster with
non-standard xid/mxid/mxoff
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%4...
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 ++++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++--
src/backend/bootstrap/bootstrap.c | 50 +++++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 +++-
src/backend/tcop/postgres.c | 53 +++++++++++-
src/bin/initdb/initdb.c | 107 ++++++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++++
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
13 files changed, 382 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a817f539ee..095c39dd93 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -1978,7 +1999,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index d31a67599c..8c33b8ba9d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -217,7 +217,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -272,12 +272,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 78e66a06ac..483307279f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -572,7 +572,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -669,10 +669,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -723,6 +731,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 184b830168..4fd594cfe5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..410868dddf 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index e1b3187d0b..f770e9a140 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
--
2.43.0
From 33e21cf86b1813a67c699d703ab1f75bcf28a7b1 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 13 Nov 2024 16:34:34 +0300
Subject: [PATCH v9 7/7] TEST: bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/include/catalog/catversion.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..18faedc963 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411112
/*
* large object chunk size added to pg_controldata,
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5dd91e190a..3d09caf5ae 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411111
+#define CATALOG_VERSION_NO 202411112
#endif
--
2.43.0
From 3558ccb4712d50bcda877474db5c9fd124b6e919 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 19 Nov 2024 17:08:10 +0300
Subject: [PATCH v9 6/7] TEST: add src/bin/pg_upgrade/t/005_offset.pl
---
src/bin/pg_upgrade/t/005_offset.pl | 562 +++++++++++++++++++++++++++++
1 file changed, 562 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/005_offset.pl
diff --git a/src/bin/pg_upgrade/t/005_offset.pl b/src/bin/pg_upgrade/t/005_offset.pl
new file mode 100644
index 0000000000..1cfd8b364a
--- /dev/null
+++ b/src/bin/pg_upgrade/t/005_offset.pl
@@ -0,0 +1,562 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# This pair of calls will create significantly more member segments than offset
+# segments.
+sub prep
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+}
+
+sub fill
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ my $nclients = 50;
+ my $update_every = 90;
+ my @connections = ();
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres');
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ for (my $i = 0; $i < 20000; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} FOR KEY SHARE");
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+}
+
+# This pair of calls will create more or less the same amount of membsers and
+# offsets segments.
+sub prep2
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl}(BAR INT PRIMARY KEY, BAZ INT); " .
+ "CREATE OR REPLACE PROCEDURE MXIDFILLER(N_STEPS INT DEFAULT 1000) " .
+ "LANGUAGE PLPGSQL " .
+ "AS \$\$ " .
+ "BEGIN " .
+ " FOR I IN 1..N_STEPS LOOP " .
+ " UPDATE ${tbl} SET BAZ = RANDOM(1, 1000) " .
+ " WHERE BAR IN (SELECT BAR FROM ${tbl} " .
+ " TABLESAMPLE BERNOULLI(80)); " .
+ " COMMIT; " .
+ " END LOOP; " .
+ "END; \$\$; " .
+ "INSERT INTO ${tbl} (BAR, BAZ) " .
+ "SELECT ID, ID FROM GENERATE_SERIES(1, 1024) ID;");
+}
+
+sub fill2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ $node->safe_psql('postgres',
+ "BEGIN; " .
+ "SELECT * FROM ${tbl} FOR KEY SHARE; " .
+ "PREPARE TRANSACTION 'A'; " .
+ "CALL MXIDFILLER((365 * ${scale})::int); " .
+ "COMMIT PREPARED 'A';");
+}
+
+
+# generate around 2 offset segments and 55 member segments
+sub mxid_gen1
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ prep($node, $tbl);
+ fill($node, $tbl);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# generate around 10 offset segments and 12 member segments
+sub mxid_gen2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ prep2($node, $tbl);
+ fill2($node, $tbl, $scale);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# Fetch latest multixact checkpoint values.
+sub multi_bounds
+{
+ my ($node) = @_;
+ my $path = $node->config_data('--bindir');
+ my ($stdout, $stderr) = run_command([
+ $path . '/pg_controldata',
+ $node->data_dir
+ ]);
+ my @control_data = split("\n", $stdout);
+ my $next = undef;
+ my $oldest = undef;
+ my $next_offset = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiXactId:\s*(.*)$/mg)
+ {
+ $next = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's oldestMultiXid:\s*(.*)$/mg)
+ {
+ $oldest = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+ {
+ $next_offset = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if (defined($oldest) && defined($next) && defined($next_offset))
+ {
+ last;
+ }
+ }
+
+ die "Latest checkpoint's NextMultiXactId not found in control file!\n"
+ unless defined($next);
+
+ die "Latest checkpoint's oldestMultiXid not found in control file!\n"
+ unless defined($oldest);
+
+ die "Latest checkpoint's NextMultiOffset not found in control file!\n"
+ unless defined($next_offset);
+
+ return ($oldest, $next, $next_offset);
+}
+
+# Create node from existing bins.
+sub create_new_node
+{
+ my ($name, %params) = @_;
+
+ create_node(0, @_);
+}
+
+# Create node from ENV oldinstall
+sub create_old_node
+{
+ my ($name, %params) = @_;
+
+ if (!defined($ENV{oldinstall}))
+ {
+ die "oldinstall is not defined";
+ }
+
+ create_node(1, @_);
+}
+
+sub create_node
+{
+ my ($install_path_from_env, $name, %params) = @_;
+ my $scale = defined $params{scale} ? $params{scale} : 1;
+ my $multi = defined $params{multi} ? $params{multi} : undef;
+ my $offset = defined $params{offset} ? $params{offset} : undef;
+
+ my $node =
+ $install_path_from_env ?
+ PostgreSQL::Test::Cluster->new($name,
+ install_path => $ENV{oldinstall}) :
+ PostgreSQL::Test::Cluster->new($name);
+
+ $node->init(force_initdb => 1,
+ extra => [
+ $multi ? ('-m', $multi) : (),
+ $offset ? ('-o', $offset) : (),
+ ]);
+
+ # Fixup MOX patch quirk
+ if ($multi)
+ {
+ unlink $node->data_dir . '/pg_multixact/offsets/0000';
+ }
+ if ($offset)
+ {
+ unlink $node->data_dir . '/pg_multixact/members/0000';
+ }
+
+ $node->append_conf('fsync', 'off');
+ $node->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+
+ $node->start();
+ mxid_gen2($node, 'FOO', $scale);
+ mxid_gen1($node, 'BAR', $scale);
+ $node->restart();
+ $node->safe_psql('postgres', q(SELECT * FROM FOO)); # just in case...
+ $node->safe_psql('postgres', q(SELECT * FROM BAR));
+ $node->safe_psql('postgres', q(CHECKPOINT));
+ $node->stop();
+
+ return $node;
+}
+
+sub do_upgrade
+{
+ my ($oldnode, $newnode) = @_;
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--check'
+ ],
+ 'run of pg_upgrade');
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ $oldnode->start();
+ $newnode->start();
+
+ my $oldfoo = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newfoo = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldfoo, $newfoo, "select foo eq");
+
+ my $oldbar = $oldnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ my $newbar = $newnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ is($oldbar, $newbar, "select bar eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+
+ multi_bounds($oldnode);
+ multi_bounds($newnode);
+}
+
+my @TESTS = (
+ # tests without ENV oldinstall
+ 0, 1, 2, 3, 4, 5, 6,
+ # tests with "real" pg_upgrade
+ 100, 101, 102, 103, 104, 105, 106,
+ # self upgrade
+ 1000,
+);
+
+# =============================================================================
+# Basic sanity tests on a NEW bin
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 0;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mo',
+ scale => 1);
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 1;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo',
+ scale => 1.15,
+ multi => '0x123400');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 2;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO',
+ scale => 1.15,
+ offset => '0x432100');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 3;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO',
+ scale => 1.15,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 4;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 5;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO_wrap',
+ scale => 1.15,
+ offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 6;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# pg_upgarde tests
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 100;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 101;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0x123400');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 102;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0x432100');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 103;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 104;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 105;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 106;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# Self upgrade
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 1000;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'self_upgrade';
+ my $oldnode = create_new_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+done_testing();
--
2.43.0
Attachments:
[text/plain] v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt (25.4K, 3-v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt)
download | inline diff:
From 2642f597832cbed0ebc54202de4e0f5770ac5f50 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v9 5/7] TEST: initdb option to initialize cluster with
non-standard xid/mxid/mxoff
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 ++++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++--
src/backend/bootstrap/bootstrap.c | 50 +++++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 +++-
src/backend/tcop/postgres.c | 53 +++++++++++-
src/bin/initdb/initdb.c | 107 ++++++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++++
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
13 files changed, 382 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a817f539ee..095c39dd93 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -1978,7 +1999,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index d31a67599c..8c33b8ba9d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -217,7 +217,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -272,12 +272,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 78e66a06ac..483307279f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -572,7 +572,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -669,10 +669,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -723,6 +731,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 184b830168..4fd594cfe5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..410868dddf 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index e1b3187d0b..f770e9a140 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
--
2.43.0
[application/octet-stream] v9-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (8.8K, 4-v9-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From d703fe4538754534817596a0d4f51e06a8c3293f Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 23 Oct 2024 18:23:39 +0300
Subject: [PATCH v9 4/7] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Instead, switch to MULTIXACT_MEMBER_AUTOVAC_THRESHOLD (eq 2^32) members
threshold. It is used to determine if we need to force autovacuum or not.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 117 +++----------------------
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/include/access/multixact.h | 1 -
4 files changed, 15 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 48e1c0160a..a817f539ee 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -204,10 +204,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuumin order to release the disk space, reduce table bloat if
+ * possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -2616,15 +2620,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2712,10 +2714,10 @@ SetOffsetVacuumLimit(bool is_startup)
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2761,101 +2763,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 86f36b3695..e7506e268a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,7 +1133,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..180bb7e96e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1915,7 +1915,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v9-0002-Use-64-bit-multixact-offsets.patch (12.9K, 5-v9-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 8cc5477a23b383132fddd4386492c0ffe6b63fb7 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v9 2/7] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 170 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 10 insertions(+), 168 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..48e1c0160a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..e1b3187d0b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -664,7 +664,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
[application/octet-stream] v9-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 6-v9-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From bc77e08c2afae2d0e4ae9222dfff1a77ef2b3f18 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v9 1/7] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 05c738d661..727b6e744f 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -876,8 +876,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.43.0
[application/octet-stream] v9-0003-Make-pg_upgrade-convert-multixact-offsets.patch (17.9K, 7-v9-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From d731c49b8c51d57ee4ae0160a4668f9f99d4a2bc Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v9 3/7] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
Author: Yura Sokolov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 42 ++-
src/bin/pg_upgrade/pg_upgrade.h | 14 +-
src/bin/pg_upgrade/segresize.c | 527 ++++++++++++++++++++++++++++++++
5 files changed, 580 insertions(+), 5 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..1654e877c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,8 +750,42 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
- copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ MultiXactOffset oldest_offset,
+ next_offset;
+
+ remove_new_subdir("pg_multixact/offsets", false);
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+ oldest_offset = convert_multixact_offsets();
+ check_ok();
+
+ remove_new_subdir("pg_multixact/members", false);
+ prep_status("Converting pg_multixact/members");
+ convert_multixact_members(oldest_offset);
+ check_ok();
+
+ next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+ if (oldest_offset)
+ {
+ if (next_offset < oldest_offset)
+ next_offset += ((MultiXactOffset) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ {
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ }
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +794,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..2c85ec1e94 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,8 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+MultiXactOffset convert_multixact_offsets(void);
+void convert_multixact_members(MultiXactOffset oldest_offset);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..73064c77de
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,527 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+} SlruSegState;
+
+/*
+ * Mirrors the SlruFileName from slru.c
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ Assert(state->segno >= 0 && state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+}
+
+/*
+ * Create new SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+}
+
+typedef uint32 MultiXactOffsetOld;
+
+#define MaxMultiXactOffsetOld ((MultiXactOffsetOld) 0xFFFFFFFF)
+
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(MultiXactOffsetOld))
+#define MULTIXACT_OFFSETS_PER_PAGE_NEW (BLCKSZ / sizeof(MultiXactOffset))
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+MultiXactOffset
+convert_multixact_offsets(void)
+{
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ MultiXactOffsetOld oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE_NEW] = {0},
+ oldest_offset = 0;
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool oldest_offset_known = false;
+
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ /* Copy multi offsets reading only needed segment pages */
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound */
+#define OLD_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_OLD / SLRU_PAGES_PER_SEGMENT)
+ if (oldseg.segno > OLD_OFFSET_SEGNO_MAX)
+ {
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+ }
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Save oldest multi offset */
+ if (!oldest_offset_known)
+ {
+ oldest_offset = oldbuf[old_entry];
+ oldest_offset_known = true;
+ }
+
+ /* Skip wrapped-around invalid MultiXactIds */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+ Assert(new_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ new_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound (1 becomes 2^32) */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1 */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE_NEW)
+ {
+ /* Handle possible segment wraparound */
+#define NEW_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_NEW / SLRU_PAGES_PER_SEGMENT)
+ if (newseg.segno > NEW_OFFSET_SEGNO_MAX)
+ {
+ newseg.segno = 0;
+ newseg.pageno = 0;
+ }
+
+ /* Write new page */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE_NEW - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Use next_offset as oldest_offset, if oldest_multi == next_multi */
+ if (!oldest_offset_known)
+ {
+ Assert(oldest_multi == next_multi);
+ oldest_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ }
+
+ /* Release resources */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ return oldest_offset;
+}
+
+#define MXACT_MEMBERS_FLAG_BYTES 1
+
+#define MULTIXACT_MEMBERS_PER_GROUP 4
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (MULTIXACT_MEMBERS_PER_GROUP * (sizeof(TransactionId) + MXACT_MEMBERS_FLAG_BYTES))
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE \
+ (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERS_PER_GROUP * MULTIXACT_MEMBERGROUPS_PER_PAGE)
+#define MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP \
+ (MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP)
+
+typedef struct MultiXactMembersCtx
+{
+ SlruSegState seg;
+ char buf[BLCKSZ];
+ int group;
+ int member;
+ char *flag;
+ TransactionId *xid;
+} MultiXactMembersCtx;
+
+static void
+MultiXactMembersCtxInit(MultiXactMembersCtx *ctx)
+{
+ ctx->seg.dir = psprintf("%s/pg_multixact/members", new_cluster.pgdata);
+
+ ctx->group = 0;
+ ctx->member = 1; /* skip invalid zero offset */
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+
+ ctx->flag += ctx->member;
+ ctx->xid += ctx->member;
+}
+
+static void
+MultiXactMembersCtxAdd(MultiXactMembersCtx *ctx, char flag, TransactionId xid)
+{
+ /* Copy member's xid and flags to the new page */
+ *ctx->flag++ = flag;
+ *ctx->xid++ = xid;
+
+ if (++ctx->member < MULTIXACT_MEMBERS_PER_GROUP)
+ return;
+
+ /* Start next member group */
+ ctx->member = 0;
+
+ if (++ctx->group >= MULTIXACT_MEMBERGROUPS_PER_PAGE)
+ {
+ /* Write current page and start new */
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ ctx->group = 0;
+ memset(ctx->buf, 0, BLCKSZ);
+ }
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+}
+
+static void
+MultiXactMembersCtxFinit(MultiXactMembersCtx *ctx)
+{
+ if (ctx->flag > (char *) ctx->buf)
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ close_segment(&ctx->seg);
+
+ pfree(ctx->seg.dir);
+}
+
+/*
+ * Convert pg_multixact/members segments, offsets will start from 1.
+ *
+ */
+void
+convert_multixact_members(MultiXactOffset oldest_offset)
+{
+ MultiXactOffset next_offset,
+ offset;
+ SlruSegState oldseg = {0};
+ char oldbuf[BLCKSZ] = {0};
+ int oldidx;
+ MultiXactMembersCtx newctx = {0};
+
+ oldseg.dir = psprintf("%s/pg_multixact/members", old_cluster.pgdata);
+
+ next_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ /* Initialize the old starting position */
+ oldseg.pageno = oldest_offset / MULTIXACT_MEMBERS_PER_PAGE;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ /* Initialize new starting position */
+ MultiXactMembersCtxInit(&newctx);
+
+ /* Iterate through the original directory */
+ oldidx = oldest_offset % MULTIXACT_MEMBERS_PER_PAGE;
+ for (offset = oldest_offset; offset < next_offset;)
+ {
+ bool empty;
+ int oldlen;
+ int ngroups;
+ int oldgroup;
+ int oldmember;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Iterate through the old member groups */
+ ngroups = oldlen / MULTIXACT_MEMBERGROUP_SIZE;
+ oldmember = oldidx % MULTIXACT_MEMBERS_PER_GROUP;
+ oldgroup = oldidx / MULTIXACT_MEMBERS_PER_GROUP;
+ while (oldgroup < ngroups && offset < next_offset)
+ {
+ char *oldflag;
+ TransactionId *oldxid;
+ int i;
+
+ oldflag = (char *) oldbuf + oldgroup * MULTIXACT_MEMBERGROUP_SIZE;
+ oldxid = (TransactionId *)(oldflag + MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP);
+
+ oldxid += oldmember;
+ oldflag += oldmember;
+
+ /* Iterate through the old members */
+ for (i = oldmember;
+ i < MULTIXACT_MEMBERS_PER_GROUP && offset < next_offset;
+ i++)
+ {
+ MultiXactMembersCtxAdd(&newctx, *oldflag++, *oldxid++);
+
+ if (++offset == (uint64) 1 << 32)
+ {
+ Assert(i == MaxMultiXactOffsetOld % MULTIXACT_MEMBERS_PER_GROUP);
+ goto wraparound;
+ }
+ }
+
+ oldgroup++;
+ oldmember = 0;
+ }
+
+ oldidx = 0;
+
+ continue;
+
+wraparound:
+#define SEGNO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE / SLRU_PAGES_PER_SEGMENT
+#define PAGENO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE % SLRU_PAGES_PER_SEGMENT
+ Assert((oldseg.segno == SEGNO_MAX && oldseg.pageno == PAGENO_MAX + 1) ||
+ (oldseg.segno == SEGNO_MAX + 1 && oldseg.pageno == 0));
+
+ /* Switch to segment 0000 */
+ close_segment(&oldseg);
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+
+ /* skip invalid zero multi offset */
+ oldidx = 1;
+ }
+
+ MultiXactMembersCtxFinit(&newctx);
+
+ /* Release resources */
+ close_segment(&oldseg);
+
+ pfree(oldseg.dir);
+}
--
2.43.0
[text/plain] v9-0007-TEST-bump-catver.patch.txt (1.1K, 8-v9-0007-TEST-bump-catver.patch.txt)
download | inline diff:
From 33e21cf86b1813a67c699d703ab1f75bcf28a7b1 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 13 Nov 2024 16:34:34 +0300
Subject: [PATCH v9 7/7] TEST: bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/include/catalog/catversion.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..18faedc963 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411112
/*
* large object chunk size added to pg_controldata,
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5dd91e190a..3d09caf5ae 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411111
+#define CATALOG_VERSION_NO 202411112
#endif
--
2.43.0
[text/plain] v9-0006-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch.txt (13.5K, 9-v9-0006-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch.txt)
download | inline diff:
From 3558ccb4712d50bcda877474db5c9fd124b6e919 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 19 Nov 2024 17:08:10 +0300
Subject: [PATCH v9 6/7] TEST: add src/bin/pg_upgrade/t/005_offset.pl
---
src/bin/pg_upgrade/t/005_offset.pl | 562 +++++++++++++++++++++++++++++
1 file changed, 562 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/005_offset.pl
diff --git a/src/bin/pg_upgrade/t/005_offset.pl b/src/bin/pg_upgrade/t/005_offset.pl
new file mode 100644
index 0000000000..1cfd8b364a
--- /dev/null
+++ b/src/bin/pg_upgrade/t/005_offset.pl
@@ -0,0 +1,562 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# This pair of calls will create significantly more member segments than offset
+# segments.
+sub prep
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+}
+
+sub fill
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ my $nclients = 50;
+ my $update_every = 90;
+ my @connections = ();
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres');
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ for (my $i = 0; $i < 20000; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} FOR KEY SHARE");
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+}
+
+# This pair of calls will create more or less the same amount of membsers and
+# offsets segments.
+sub prep2
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl}(BAR INT PRIMARY KEY, BAZ INT); " .
+ "CREATE OR REPLACE PROCEDURE MXIDFILLER(N_STEPS INT DEFAULT 1000) " .
+ "LANGUAGE PLPGSQL " .
+ "AS \$\$ " .
+ "BEGIN " .
+ " FOR I IN 1..N_STEPS LOOP " .
+ " UPDATE ${tbl} SET BAZ = RANDOM(1, 1000) " .
+ " WHERE BAR IN (SELECT BAR FROM ${tbl} " .
+ " TABLESAMPLE BERNOULLI(80)); " .
+ " COMMIT; " .
+ " END LOOP; " .
+ "END; \$\$; " .
+ "INSERT INTO ${tbl} (BAR, BAZ) " .
+ "SELECT ID, ID FROM GENERATE_SERIES(1, 1024) ID;");
+}
+
+sub fill2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ $node->safe_psql('postgres',
+ "BEGIN; " .
+ "SELECT * FROM ${tbl} FOR KEY SHARE; " .
+ "PREPARE TRANSACTION 'A'; " .
+ "CALL MXIDFILLER((365 * ${scale})::int); " .
+ "COMMIT PREPARED 'A';");
+}
+
+
+# generate around 2 offset segments and 55 member segments
+sub mxid_gen1
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ prep($node, $tbl);
+ fill($node, $tbl);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# generate around 10 offset segments and 12 member segments
+sub mxid_gen2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ prep2($node, $tbl);
+ fill2($node, $tbl, $scale);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# Fetch latest multixact checkpoint values.
+sub multi_bounds
+{
+ my ($node) = @_;
+ my $path = $node->config_data('--bindir');
+ my ($stdout, $stderr) = run_command([
+ $path . '/pg_controldata',
+ $node->data_dir
+ ]);
+ my @control_data = split("\n", $stdout);
+ my $next = undef;
+ my $oldest = undef;
+ my $next_offset = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiXactId:\s*(.*)$/mg)
+ {
+ $next = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's oldestMultiXid:\s*(.*)$/mg)
+ {
+ $oldest = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+ {
+ $next_offset = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if (defined($oldest) && defined($next) && defined($next_offset))
+ {
+ last;
+ }
+ }
+
+ die "Latest checkpoint's NextMultiXactId not found in control file!\n"
+ unless defined($next);
+
+ die "Latest checkpoint's oldestMultiXid not found in control file!\n"
+ unless defined($oldest);
+
+ die "Latest checkpoint's NextMultiOffset not found in control file!\n"
+ unless defined($next_offset);
+
+ return ($oldest, $next, $next_offset);
+}
+
+# Create node from existing bins.
+sub create_new_node
+{
+ my ($name, %params) = @_;
+
+ create_node(0, @_);
+}
+
+# Create node from ENV oldinstall
+sub create_old_node
+{
+ my ($name, %params) = @_;
+
+ if (!defined($ENV{oldinstall}))
+ {
+ die "oldinstall is not defined";
+ }
+
+ create_node(1, @_);
+}
+
+sub create_node
+{
+ my ($install_path_from_env, $name, %params) = @_;
+ my $scale = defined $params{scale} ? $params{scale} : 1;
+ my $multi = defined $params{multi} ? $params{multi} : undef;
+ my $offset = defined $params{offset} ? $params{offset} : undef;
+
+ my $node =
+ $install_path_from_env ?
+ PostgreSQL::Test::Cluster->new($name,
+ install_path => $ENV{oldinstall}) :
+ PostgreSQL::Test::Cluster->new($name);
+
+ $node->init(force_initdb => 1,
+ extra => [
+ $multi ? ('-m', $multi) : (),
+ $offset ? ('-o', $offset) : (),
+ ]);
+
+ # Fixup MOX patch quirk
+ if ($multi)
+ {
+ unlink $node->data_dir . '/pg_multixact/offsets/0000';
+ }
+ if ($offset)
+ {
+ unlink $node->data_dir . '/pg_multixact/members/0000';
+ }
+
+ $node->append_conf('fsync', 'off');
+ $node->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+
+ $node->start();
+ mxid_gen2($node, 'FOO', $scale);
+ mxid_gen1($node, 'BAR', $scale);
+ $node->restart();
+ $node->safe_psql('postgres', q(SELECT * FROM FOO)); # just in case...
+ $node->safe_psql('postgres', q(SELECT * FROM BAR));
+ $node->safe_psql('postgres', q(CHECKPOINT));
+ $node->stop();
+
+ return $node;
+}
+
+sub do_upgrade
+{
+ my ($oldnode, $newnode) = @_;
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--check'
+ ],
+ 'run of pg_upgrade');
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ $oldnode->start();
+ $newnode->start();
+
+ my $oldfoo = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newfoo = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldfoo, $newfoo, "select foo eq");
+
+ my $oldbar = $oldnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ my $newbar = $newnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ is($oldbar, $newbar, "select bar eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+
+ multi_bounds($oldnode);
+ multi_bounds($newnode);
+}
+
+my @TESTS = (
+ # tests without ENV oldinstall
+ 0, 1, 2, 3, 4, 5, 6,
+ # tests with "real" pg_upgrade
+ 100, 101, 102, 103, 104, 105, 106,
+ # self upgrade
+ 1000,
+);
+
+# =============================================================================
+# Basic sanity tests on a NEW bin
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 0;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mo',
+ scale => 1);
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 1;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo',
+ scale => 1.15,
+ multi => '0x123400');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 2;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO',
+ scale => 1.15,
+ offset => '0x432100');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 3;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO',
+ scale => 1.15,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 4;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 5;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO_wrap',
+ scale => 1.15,
+ offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 6;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# pg_upgarde tests
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 100;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 101;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0x123400');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 102;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0x432100');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 103;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 104;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 105;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 106;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# Self upgrade
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 1000;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'self_upgrade';
+ my $oldnode = create_new_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+done_testing();
--
2.43.0
view thread (21+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: POC: make mxidoff 64 bits
In-Reply-To: <CACG=ezajc_Pcqmy6fcq-N8+LzCRMzOzJzez2_BgHEu-6RVJtKQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox