public inbox for [email protected]
help / color / mirror / Atom feedFrom: Maxim Orlov <[email protected]>
To: wenhui qiu <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: make mxidoff 64 bits
Date: Fri, 8 Nov 2024 21:10:28 +0300
Message-ID: <CACG=ezYThNkf8QsDA-aQfEFEkqn2L=_uUL83z0vJstPRasbZqg@mail.gmail.com> (raw)
In-Reply-To: <CAGjGUA+dcV7veaCV1H65vCNsbS++nT8=ho772gDvsXUW9H7eXQ@mail.gmail.com>
References: <CACG=ezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com>
<[email protected]>
<CAGjGUAKO1GCzG5wBMt5RosWo0PatgFpYY=Gjgt77tN2brNe=Bg@mail.gmail.com>
<CACG=ezYokoiumOFnqUfg_ffHD5s8T+6iHYfzKLfa=QQ-1pNrBg@mail.gmail.com>
<CACG=ezY9xq73jcX_EjVqx5-f90nbQ9PyhFCTW2fwFCS2wmNiFw@mail.gmail.com>
<CACG=eza+27CfLBobJJccRhXrA3He6c1irAnoyTtSC1-z9UXLrg@mail.gmail.com>
<CAPpHfduczcop9s6gKUpLGgFUe2y4ERGMJx6SS6Kp+s-kQPwMjg@mail.gmail.com>
<CACG=ezbye4g_ERNqE=gBcvQ0YypRaVENhNUu8xrs4PL12UdnUA@mail.gmail.com>
<CACG=ezaMncd0-BcGHBgsSR2eqHfrz9WznHGLKX8biz6zu-azGw@mail.gmail.com>
<[email protected]>
<CACG=ezb9XTvd3ZmS0y8gUunx_wBBdJO7ou+BfCOnnA5jE-11vg@mail.gmail.com>
<CACG=ezYFNqGjsxF6Vb2CHF6JzKcjhAFauaFm9js0nu_3Ngcdkw@mail.gmail.com>
<CAGjGUA+dcV7veaCV1H65vCNsbS++nT8=ho772gDvsXUW9H7eXQ@mail.gmail.com>
On Fri, 25 Oct 2024 at 06:39, wenhui qiu <[email protected]> wrote:
>
> + * Multixact members warning threshold.
> + *
> + * If difference bettween nextOffset and oldestOffset exceed this value,
> we
> + * trigger autovacuum in order to release the disk space if possible.
> + */
> +#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
> Can we refine this annotation a bit? for example
>
Thank you, fixed.
Sorry for a late reply. There was a problem in upgrade with offset
wraparound. Here is a fixed version. Test also added. I decide to use my
old patch to set a non-standard multixacts for the old cluster, fill it
with data and do pg_upgrade.
Here is how to test. All the patches are for 14e87ffa5c543b5f3 master
branch.
1) Get the 14e87ffa5c543b5f3 master branch apply patches
0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch and
0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch
2) Get the 14e87ffa5c543b5f3 master branch in a separate directory and
apply v6 patch set.
3) Build two branches.
4) Use ENV oldinstall to run the test: PROVE_TESTS=t/005_mxidoff.pl
oldinstall=/home/orlov/proj/pgsql-new PG_TEST_NOCLEAN=1 make check -C
src/bin/pg_upgrade/
Maybe, I'll make a shell script to automate this steps if required.
--
Best regards,
Maxim Orlov.
Attachments:
[application/octet-stream] v6-0001-Use-64-bit-multixact-offsets.patch (13.3K, 3-v6-0001-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 2a8708fa5d31c6523c7d2654ee1215beda6f1ff0 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v6 1/6] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..e1b3187d0b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -664,7 +664,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
[application/octet-stream] v6-0002-Make-pg_upgrade-convert-multixact-offsets.patch (18.3K, 4-v6-0002-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From a48ec9aaf3de859050dd0ad484dc1fb5f174cf8a Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v6 2/6] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
Author: Yura Sokolov <[email protected]>
---
src/backend/access/transam/multixact.c | 2 +-
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 42 +-
src/bin/pg_upgrade/pg_upgrade.h | 14 +-
src/bin/pg_upgrade/segresize.c | 518 +++++++++++++++++++++++++
6 files changed, 572 insertions(+), 6 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c51e03e832..48e1c0160a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1891,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- true);
+ false);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..1654e877c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,8 +750,42 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
- copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ MultiXactOffset oldest_offset,
+ next_offset;
+
+ remove_new_subdir("pg_multixact/offsets", false);
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+ oldest_offset = convert_multixact_offsets();
+ check_ok();
+
+ remove_new_subdir("pg_multixact/members", false);
+ prep_status("Converting pg_multixact/members");
+ convert_multixact_members(oldest_offset);
+ check_ok();
+
+ next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+ if (oldest_offset)
+ {
+ if (next_offset < oldest_offset)
+ next_offset += ((MultiXactOffset) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ {
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ }
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +794,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..2c85ec1e94 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,8 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+MultiXactOffset convert_multixact_offsets(void);
+void convert_multixact_members(MultiXactOffset oldest_offset);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..ff7ff65758
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,518 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+} SlruSegState;
+
+/*
+ * Get SLRU segmen file name from state.
+ *
+ * NOTE: this function should mirror SlruFileName call.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+
+ return psprintf("%s/%04X", state->dir, (unsigned int) (state->segno));
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+}
+
+typedef uint32 MultiXactOffsetOld;
+
+#define MaxMultiXactOffsetOld ((MultiXactOffsetOld) 0xFFFFFFFF)
+
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(MultiXactOffsetOld))
+#define MULTIXACT_OFFSETS_PER_PAGE_NEW (BLCKSZ / sizeof(MultiXactOffset))
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+MultiXactOffset
+convert_multixact_offsets(void)
+{
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ MultiXactOffsetOld oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE_NEW] = {0},
+ oldest_offset = 0;
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi;
+ uint64 old_entry;
+ uint64 new_entry;
+ bool oldest_offset_known = false;
+
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ /* Copy multi offsets reading only needed segment pages */
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool is_empty;
+
+ /* Handle possible segment wraparound */
+ if (oldseg.segno > MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_OLD / SLRU_PAGES_PER_SEGMENT)
+ oldseg.segno = 0;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &is_empty);
+
+ if (oldlen <= 0 || is_empty)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ if (oldlen < BLCKSZ)
+ memset((char *) oldbuf + oldlen, 0, BLCKSZ - oldlen);
+
+ /* Save oldest multi offset */
+ if (!oldest_offset_known)
+ {
+ oldest_offset = oldbuf[old_entry];
+ oldest_offset_known = true;
+ }
+
+ /* Skip wrapped-around invalid MultiXactIds */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound (1 becomes 2^32) */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1 */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE_NEW)
+ {
+ /* Write new page */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE_NEW - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Use next_offset as oldest_offset, if oldest_multi == next_multi */
+ if (!oldest_offset_known)
+ {
+ Assert(oldest_multi == next_multi);
+ oldest_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ }
+
+ /* Release resources */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ return oldest_offset;
+}
+
+#define MXACT_MEMBERS_FLAG_BYTES 1
+
+#define MULTIXACT_MEMBERS_PER_GROUP 4
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (MULTIXACT_MEMBERS_PER_GROUP * (sizeof(TransactionId) + MXACT_MEMBERS_FLAG_BYTES))
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE \
+ (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERS_PER_GROUP * MULTIXACT_MEMBERGROUPS_PER_PAGE)
+#define MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP \
+ (MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP)
+
+typedef struct MultiXactMembersCtx
+{
+ SlruSegState seg;
+ char buf[BLCKSZ];
+ int group;
+ int member;
+ char *flag;
+ TransactionId *xid;
+} MultiXactMembersCtx;
+
+static void
+MultiXactMembersCtxInit(MultiXactMembersCtx *ctx)
+{
+ ctx->seg.dir = psprintf("%s/pg_multixact/members", new_cluster.pgdata);
+
+ ctx->group = 0;
+ ctx->member = 1; /* skip invalid zero offset */
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+
+ ctx->flag += ctx->member;
+ ctx->xid += ctx->member;
+}
+
+static void
+MultiXactMembersCtxAdd(MultiXactMembersCtx *ctx, char flag, TransactionId xid)
+{
+ /* Copy member's xid and flags to the new page */
+ *ctx->flag++ = flag;
+ *ctx->xid++ = xid;
+
+ if (++ctx->member < MULTIXACT_MEMBERS_PER_GROUP)
+ return;
+
+ /* Start next member group */
+ ctx->member = 0;
+
+ if (++ctx->group >= MULTIXACT_MEMBERGROUPS_PER_PAGE)
+ {
+ /* Write current page and start new */
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ ctx->group = 0;
+ memset(ctx->buf, 0, BLCKSZ);
+ }
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+}
+
+static void
+MultiXactMembersCtxFinit(MultiXactMembersCtx *ctx)
+{
+ if (ctx->flag > (char *) ctx->buf)
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ close_segment(&ctx->seg);
+
+ pfree(ctx->seg.dir);
+}
+
+/*
+ * Convert pg_multixact/members segments, offsets will start from 1.
+ */
+void
+convert_multixact_members(MultiXactOffset oldest_offset)
+{
+ MultiXactOffset next_offset;
+ MultiXactOffset offset;
+ SlruSegState oldseg = {0};
+ char oldbuf[BLCKSZ] = {0};
+ int oldidx;
+ MultiXactMembersCtx newctx = {0};
+
+ oldseg.dir = psprintf("%s/pg_multixact/members", old_cluster.pgdata);
+
+ next_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ /* Initialize the old starting position */
+ oldseg.pageno = oldest_offset / MULTIXACT_MEMBERS_PER_PAGE;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ /* Initialize new starting position */
+ MultiXactMembersCtxInit(&newctx);
+
+ /* Iterate through the original directory */
+ oldidx = oldest_offset % MULTIXACT_MEMBERS_PER_PAGE;
+ for (offset = oldest_offset; offset < next_offset;)
+ {
+ bool empty;
+ int oldlen;
+ int ngroups;
+ int oldgroup;
+ int oldmember;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Iterate through the old member groups */
+ ngroups = oldlen / MULTIXACT_MEMBERGROUP_SIZE;
+ oldmember = oldidx % MULTIXACT_MEMBERS_PER_GROUP;
+ oldgroup = oldidx / MULTIXACT_MEMBERS_PER_GROUP;
+ while (oldgroup < ngroups && offset < next_offset)
+ {
+ char *oldflag;
+ TransactionId *oldxid;
+ int i;
+
+ oldflag = (char *) oldbuf + oldgroup * MULTIXACT_MEMBERGROUP_SIZE;
+ oldxid = (TransactionId *)(oldflag + MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP);
+
+ oldxid += oldmember;
+ oldflag += oldmember;
+
+ /* Iterate through the old members */
+ for (i = oldmember;
+ i < MULTIXACT_MEMBERS_PER_GROUP && offset < next_offset;
+ i++)
+ {
+ MultiXactMembersCtxAdd(&newctx, *oldflag++, *oldxid++);
+
+ if (++offset == (uint64) 1 << 32)
+ {
+ Assert(i == MaxMultiXactOffsetOld % MULTIXACT_MEMBERS_PER_GROUP);
+ goto wraparound;
+ }
+ }
+
+ oldgroup++;
+ oldmember = 0;
+ }
+
+ oldidx = 0;
+
+ continue;
+
+wraparound:
+#define SEGNO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE / SLRU_PAGES_PER_SEGMENT
+#define PAGENO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE % SLRU_PAGES_PER_SEGMENT
+ Assert((oldseg.segno == SEGNO_MAX && oldseg.pageno == PAGENO_MAX + 1) ||
+ (oldseg.segno == SEGNO_MAX + 1 && oldseg.pageno == 0));
+
+ /* Switch to segment 0000 */
+ close_segment(&oldseg);
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+
+ /* skip invalid zero multi offset */
+ oldidx = 1;
+ }
+
+ MultiXactMembersCtxFinit(&newctx);
+
+ /* Release resources */
+ close_segment(&oldseg);
+
+ pfree(oldseg.dir);
+}
--
2.43.0
[application/octet-stream] v6-0004-TEST-lower-SLRU_PAGES_PER_SEGMENT-set-bump-catver.patch (2.1K, 5-v6-0004-TEST-lower-SLRU_PAGES_PER_SEGMENT-set-bump-catver.patch)
download | inline diff:
From 970940711a6a4eab4e30f05412dba90fe2570433 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 29 Oct 2024 18:28:40 +0300
Subject: [PATCH v6 4/6] TEST: lower SLRU_PAGES_PER_SEGMENT + set bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/bin/pg_upgrade/segresize.c | 2 +-
src/include/access/slru.h | 2 +-
src/include/catalog/catversion.h | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..01252a7ed5 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411082
/*
* large object chunk size added to pg_controldata,
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
index ff7ff65758..0547b51741 100644
--- a/src/bin/pg_upgrade/segresize.c
+++ b/src/bin/pg_upgrade/segresize.c
@@ -13,7 +13,7 @@
#include "access/multixact.h"
/* See slru.h */
-#define SLRU_PAGES_PER_SEGMENT 32
+#define SLRU_PAGES_PER_SEGMENT 2
/*
* Some kind of iterator associated with a particular SLRU segment. The idea is
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 97e612cd10..74dd54819d 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -36,7 +36,7 @@
* take no explicit notice of that fact in slru.c, except when comparing
* segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
*/
-#define SLRU_PAGES_PER_SEGMENT 32
+#define SLRU_PAGES_PER_SEGMENT 2
/*
* Page status codes. Note that these do not include the "dirty" bit.
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 86436e0356..05048a512b 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411081
+#define CATALOG_VERSION_NO 202411082
#endif
--
2.43.0
[application/octet-stream] v6-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch (25.4K, 6-v6-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch)
download | inline diff:
From 6e959f89e37614b94d3c4dd5695355095e8c38fd Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v6 5/6] TEST: initdb option to initialize cluster with
non-standard xid/mxid/mxoff
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 ++++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++--
src/backend/bootstrap/bootstrap.c | 50 +++++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 +++-
src/backend/tcop/postgres.c | 53 +++++++++++-
src/bin/initdb/initdb.c | 107 ++++++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++++
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
13 files changed, 382 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a817f539ee..095c39dd93 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -1978,7 +1999,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ed59dfce89..05ce03a3a3 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -216,7 +216,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -271,12 +271,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 8bee1fb664..af4b004e04 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -562,7 +562,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -659,10 +659,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -713,6 +721,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index aac0b96bbc..1f0e27b9bf 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..410868dddf 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index e1b3187d0b..f770e9a140 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
--
2.43.0
[application/octet-stream] v6-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (8.8K, 7-v6-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From d5f1e8880a5f072c389274954b21f982797af47e Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 23 Oct 2024 18:23:39 +0300
Subject: [PATCH v6 3/6] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Instead, switch to MULTIXACT_MEMBER_AUTOVAC_THRESHOLD (eq 2^32) members
threshold. It is used to determine if we need to force autovacuum or not.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 117 +++----------------------
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/include/access/multixact.h | 1 -
4 files changed, 15 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 48e1c0160a..a817f539ee 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -204,10 +204,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuumin order to release the disk space, reduce table bloat if
+ * possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -2616,15 +2620,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2712,10 +2714,10 @@ SetOffsetVacuumLimit(bool is_startup)
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2761,101 +2763,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 86f36b3695..e7506e268a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,7 +1133,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..180bb7e96e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1915,7 +1915,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v6-0006-TEST-add-basic-mxidoff64-tests-005_mxidoff.pl.patch (10.6K, 8-v6-0006-TEST-add-basic-mxidoff64-tests-005_mxidoff.pl.patch)
download | inline diff:
From 386cfe747bc4ccd867f3e27f5f7669c8eb7692f3 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Sat, 2 Nov 2024 10:46:16 +0300
Subject: [PATCH v6 6/6] TEST: add basic mxidoff64 tests 005_mxidoff.pl
---
src/bin/pg_upgrade/t/005_mxidoff.pl | 389 ++++++++++++++++++++++++++++
1 file changed, 389 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/005_mxidoff.pl
diff --git a/src/bin/pg_upgrade/t/005_mxidoff.pl b/src/bin/pg_upgrade/t/005_mxidoff.pl
new file mode 100644
index 0000000000..e595870543
--- /dev/null
+++ b/src/bin/pg_upgrade/t/005_mxidoff.pl
@@ -0,0 +1,389 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if (!defined($ENV{oldinstall}))
+{
+ die "oldinstall is not defined";
+}
+
+sub mxid_prepare
+{
+ my ($node) = @_;
+
+ $node->safe_psql('postgres',
+ q(
+ CREATE TABLE FOO(BAR INT PRIMARY KEY, BAZ INT);
+ CREATE OR REPLACE PROCEDURE MXIDFILLER(N_STEPS INT DEFAULT 1000)
+ LANGUAGE PLPGSQL
+ AS $$
+ BEGIN
+ FOR I IN 1..N_STEPS LOOP
+ UPDATE FOO SET BAZ = RANDOM(1, 1000)
+ WHERE BAR IN (SELECT BAR FROM FOO TABLESAMPLE BERNOULLI(80));
+ COMMIT;
+ END LOOP;
+ END;$$;
+ INSERT INTO FOO (BAR, BAZ) SELECT ID, ID FROM GENERATE_SERIES(1, 512) ID;
+ ));
+}
+
+sub mxid_fill
+{
+ my ($node) = @_;
+
+ $node->safe_psql('postgres',
+ q(
+ BEGIN;
+ SELECT * FROM FOO FOR KEY SHARE;
+ PREPARE TRANSACTION 'A';
+ CALL MXIDFILLER(365);
+ COMMIT PREPARED 'A';
+ ),
+ timeout => 3600);
+}
+
+# Fetch latest multixact checkpoint values.
+sub multi_bounds
+{
+ my ($node) = @_;
+ my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+ my @control_data = split("\n", $stdout);
+ my $next = undef;
+ my $oldest = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiXactId:\s*(.*)$/mg)
+ {
+ $next = $1;
+ }
+
+ if ($_ =~ /^Latest checkpoint's oldestMultiXid:\s*(.*)$/mg)
+ {
+ $oldest = $1;
+ }
+
+ if (defined($oldest) && defined($next))
+ {
+ last;
+ }
+ }
+
+ die "Latest checkpoint's NextMultiXactId not found in control file!\n"
+ unless defined($next);
+
+ die "Latest checkpoint's oldestMultiXid not found in control file!\n"
+ unless defined($oldest);
+
+ return ($oldest, $next);
+}
+
+# List pg_multixact/offsets segments filenames.
+sub list_actual_multixact_offsets
+{
+ my ($node) = @_;
+ my $dir;
+
+ opendir($dir, $node->data_dir . '/pg_multixact/offsets') or die $!;
+ my @list = sort grep { /[0-9A-F]+/ } readdir $dir;
+ closedir $dir;
+
+ return @list;
+}
+
+use constant SIZEOF_MULTI_XACT_OFFSET => 8;
+use constant BLCKSZ => 8192;
+use constant MULTIXACT_OFFSETS_PER_PAGE => BLCKSZ / SIZEOF_MULTI_XACT_OFFSET;
+use constant SLRU_PAGES_PER_SEGMENT => 2;
+
+# See src/backend/access/transam/multixact.c
+sub MultiXactIdToOffsetSegment
+{
+ my ($multi) = @_;
+
+ return $multi / MULTIXACT_OFFSETS_PER_PAGE / SLRU_PAGES_PER_SEGMENT;
+}
+
+# Validate pg_multixact/offsets segments conversion.
+sub validate_multixact_offsets
+{
+ my ($old, $new, $oldnode) = @_;
+ my ($oldest, $next) = multi_bounds($oldnode);
+ my $maxsegno = MultiXactIdToOffsetSegment($next);
+ my $maxsegname = sprintf("%04X", $maxsegno);
+
+ print(">>>>>>>>>\n");
+ foreach my $segname ( @$old )
+ {
+ my $segno = hex($segname) * 2;
+ my $converted1 = sprintf("%04X", $segno);
+ my $converted2 = sprintf("%04X", $segno + 1);
+
+ print "[${segname}] -> [${converted1}, ${converted2}] \n";
+ # Skip the last segment as it may be incomplete.
+ if (not $converted1 eq $maxsegname)
+ {
+ die "Segmanet ${segname} is not properly converted"
+ unless (not $converted1 eq $maxsegname) and
+ grep { $converted1 eq $_ } @$new and
+ grep { $converted2 eq $_ } @$new;
+ }
+ }
+ print(">>>>>>>>>\n");
+
+ return 1;
+}
+
+#
+# Select tests to run.
+#
+my @tests = (0, 1, 2, 3);
+
+# =============================================================================
+# CASE 0
+#
+# There must be several segments starting from the zero.
+# =============================================================================
+SKIP:
+{
+ skip "case 0", 0
+ unless ( grep( /^0$/, @tests ) );
+
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node0',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1);
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node0');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case0: offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT 1));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT 1));
+ is($oldval, $newval, "case1: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+# =============================================================================
+# CASE 1
+#
+# There must be several segments starting from the zero.
+# =============================================================================
+SKIP:
+{
+ skip "case 1", 1
+ unless ( grep( /^1$/, @tests ) );
+
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node1',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1);
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+ $oldnode->start();
+
+ mxid_prepare($oldnode);
+ mxid_fill($oldnode);
+
+ $oldnode->safe_psql('postgres', q(CHECKPOINT));
+ $oldnode->stop();
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node1');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case1: offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldval, $newval, "case1: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+# =============================================================================
+# CASE 2
+#
+# Non-standard oldestMultiXid and NextMultiXactId.
+# There must be several segments starting from some value.
+# =============================================================================
+SKIP:
+{
+ skip "case 2", 2
+ unless ( grep( /^2$/, @tests ) );
+
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node2',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1,
+ extra => [
+ '-m', '0x123000', '-o', '0x123000'
+ ]);
+
+ # Fixup MOX patch quirk
+ unlink $oldnode->data_dir . '/pg_multixact/members/0000';
+ unlink $oldnode->data_dir . '/pg_multixact/offsets/0000';
+
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+ $oldnode->start();
+
+ mxid_prepare($oldnode);
+ mxid_fill($oldnode);
+
+ $oldnode->safe_psql('postgres', q(CHECKPOINT));
+ $oldnode->stop();
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node2');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case2: non-standard offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldval, $newval, "case2: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+# =============================================================================
+# CASE 3
+#
+# Non-standard oldestMultiXid and NextMultiXactId.
+# =============================================================================
+SKIP:
+{
+ skip "case 3", 3
+ unless ( grep( /^3$/, @tests ) );
+ chdir ${PostgreSQL::Test::Utils::tmp_check};
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node3',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1,
+ extra => [
+ '-m', '0xFFFF0000', '-o', '0xFFFF0000'
+ ]);
+
+ # Fixup MOX patch quirk
+ unlink $oldnode->data_dir . '/pg_multixact/members/0000';
+ unlink $oldnode->data_dir . '/pg_multixact/offsets/0000';
+
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+ $oldnode->start();
+
+ mxid_prepare($oldnode);
+ mxid_fill($oldnode);
+ mxid_fill($oldnode);
+
+ $oldnode->safe_psql('postgres', q(CHECKPOINT));
+ $oldnode->stop();
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node3');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case3: multi warp, non-standard offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldval, $newval, "case3: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+done_testing();
--
2.43.0
[application/octet-stream] 0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch (784B, 9-0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch)
download | inline diff:
From 57f96bdfe7b78794e7abe8802550e4a31e6c9370 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Fri, 8 Nov 2024 20:56:27 +0300
Subject: [PATCH 2/2] TEST: lower SLRU_PAGES_PER_SEGMENT
---
src/include/access/slru.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 97e612cd10..74dd54819d 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -36,7 +36,7 @@
* take no explicit notice of that fact in slru.c, except when comparing
* segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
*/
-#define SLRU_PAGES_PER_SEGMENT 32
+#define SLRU_PAGES_PER_SEGMENT 2
/*
* Page status codes. Note that these do not include the "dirty" bit.
--
2.43.0
[application/octet-stream] 0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch (33.0K, 10-0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch)
download | inline diff:
From 34623803146a152796b611421dd9684e4fefa785 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH 1/2] Add initdb option to initialize cluster with non-standard
xid/mxid/mxoff.
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 +++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++-
src/backend/bootstrap/bootstrap.c | 50 +++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 ++-
src/backend/tcop/postgres.c | 53 ++++++++++-
src/bin/initdb/initdb.c | 107 +++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++
src/bin/pg_amcheck/t/004_verify_heapam.pl | 35 +++----
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +-
src/test/regress/pg_regress.c | 3 +-
src/test/xid-64/t/001_test_large_xids.pl | 54 +++++++++++
17 files changed, 460 insertions(+), 32 deletions(-)
create mode 100644 src/test/xid-64/t/001_test_large_xids.pl
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..017eff07bd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2035,6 +2035,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -2046,6 +2047,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -2058,7 +2079,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ed59dfce89..38165eb796 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -216,7 +216,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -271,12 +271,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 8bee1fb664..af4b004e04 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -562,7 +562,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -659,10 +659,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -713,6 +721,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index aac0b96bbc..4636d99b2f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..1cc54392e5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index 95fe6e6d3b..93eefd0479 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -320,6 +320,8 @@ my $relfrozenxid = $node->safe_psql('postgres',
q(select relfrozenxid from pg_class where relname = 'test'));
my $datfrozenxid = $node->safe_psql('postgres',
q(select datfrozenxid from pg_database where datname = 'postgres'));
+my $datminmxid = $node->safe_psql('postgres',
+ q(select datminmxid from pg_database where datname = 'postgres'));
# Sanity check that our 'test' table has a relfrozenxid newer than the
# datfrozenxid for the database, and that the datfrozenxid is greater than the
@@ -454,40 +456,39 @@ for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
# Expected corruption report
push @expected,
- qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ qr/${header}xmin $xmin precedes relation freeze threshold \d+/;
}
elsif ($offnum == 2)
{
# Corruptly set xmin < datfrozenxid
- my $xmin = 3;
+ my $xmin = $datfrozenxid - 12;
$tup->{t_xmin} = $xmin;
$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
push @expected,
- qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID \d+/;
}
elsif ($offnum == 3)
{
- # Corruptly set xmin < datfrozenxid, further back, noting circularity
- # of xid comparison.
- my $xmin = 4026531839;
+ # Corruptly set xmin > next transaction id.
+ my $xmin = $relfrozenxid + 1000000;
$tup->{t_xmin} = $xmin;
$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
push @expected,
- qr/${$header}xmin ${xmin} precedes oldest valid transaction ID 0:\d+/;
+ qr/${$header}xmin $xmin equals or exceeds next valid transaction ID \d+/;
}
elsif ($offnum == 4)
{
- # Corruptly set xmax < relminmxid;
- my $xmax = 4026531839;
+ # Corruptly set xmax > next transaction id.
+ my $xmax = $relfrozenxid + 1000000;
$tup->{t_xmax} = $xmax;
$tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
push @expected,
- qr/${$header}xmax ${xmax} precedes oldest valid transaction ID 0:\d+/;
+ qr/${$header}xmax $xmax equals or exceeds next valid transaction ID \d+/;
}
elsif ($offnum == 5)
{
@@ -590,31 +591,33 @@ for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
# Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
$tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
$tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
- $tup->{t_xmax} = 4;
+ my $xmax = $datminmxid + 1000000;
+ $tup->{t_xmax} = $xmax;
push @expected,
- qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ qr/${header}multitransaction ID $xmax equals or exceeds next valid multitransaction ID \d+/;
}
elsif ($offnum == 15)
{
# Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
$tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
$tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
- $tup->{t_xmax} = 4000000000;
+ my $xmax = $datminmxid - 10;
+ $tup->{t_xmax} = $xmax;
push @expected,
- qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ qr/${header}multitransaction ID $xmax precedes relation minimum multitransaction ID threshold \d+/;
}
elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
{
# Corruptly set xmin > next_xid to be in the future.
- my $xmin = 123456;
+ my $xmin = $relfrozenxid + 1000000;
$tup->{t_xmin} = $xmin;
$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
push @expected,
- qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/;
+ qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID \d+/;
}
elsif ($offnum == 17)
{
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..218afeeb3b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint32 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index e5526c7565..79df6faeb9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -643,7 +643,9 @@ sub init
{
note("initializing database system by running initdb");
PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
- 'trust', '-N', @{ $params{extra} });
+ 'trust', '-N',
+ '-x', '124983', '-m', '242236', '-o', '359488',
+ @{ $params{extra} });
}
else
{
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 0e40ed32a2..3511c4b500 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2333,7 +2333,8 @@ regression_main(int argc, char *argv[],
note("initializing database system by running initdb");
appendStringInfo(&cmd,
- "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
+ "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync"
+ " -x 124983 -m 242236 -o 359488",
bindir ? bindir : "",
bindir ? "/" : "",
temp_instance);
diff --git a/src/test/xid-64/t/001_test_large_xids.pl b/src/test/xid-64/t/001_test_large_xids.pl
new file mode 100644
index 0000000000..4c7dbc6cb1
--- /dev/null
+++ b/src/test/xid-64/t/001_test_large_xids.pl
@@ -0,0 +1,54 @@
+# Tests for large xid values
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use bigint;
+
+sub command_output
+{
+ my ($cmd) = @_;
+ my ($stdout, $stderr);
+ print("# Running: " . join(" ", @{$cmd}) . "\n");
+ my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ ok($result, "@$cmd exit code 0");
+ is($stderr, '', "@$cmd no stderr");
+ return $stdout;
+}
+
+my $START_VAL = 2**32;
+my $MAX_VAL = 2**62;
+
+my $ixid = $START_VAL + int(rand($MAX_VAL - $START_VAL));
+my $imxid = $START_VAL + int(rand($MAX_VAL - $START_VAL));
+my $imoff = $START_VAL + int(rand($MAX_VAL - $START_VAL));
+
+# Initialize master node with the random xid-related parameters
+my $node = PostgreSQL::Test::Cluster->new('master');
+$node->init(extra => [ "--xid=$ixid", "--multixact-id=$imxid", "--multixact-offset=$imoff" ]);
+$node->start;
+
+# Initialize master node and check the xid-related parameters
+my $pgcd_output = command_output(
+ [ 'pg_controldata', '-D', $node->data_dir ] );
+print($pgcd_output); print('\n');
+ok($pgcd_output =~ qr/Latest checkpoint's NextXID:\s*(\d+)/, "XID found");
+my ($nextxid) = ($1);
+ok($nextxid >= $ixid && $nextxid < $ixid + 1000,
+ "Latest checkpoint's NextXID ($nextxid) is close to the initial xid ($ixid).");
+ok($pgcd_output =~ qr/Latest checkpoint's NextMultiXactId:\s*(\d+)/, "MultiXactId found");
+my ($nextmxid) = ($1);
+ok($nextmxid >= $imxid && $nextmxid < $imxid + 1000,
+ "Latest checkpoint's NextMultiXactId ($nextmxid) is close to the initial multiXactId ($imxid).");
+ok($pgcd_output =~ qr/Latest checkpoint's NextMultiOffset:\s*(\d+)/, "MultiOffset found");
+my ($nextmoff) = ($1);
+ok($nextmoff >= $imoff && $nextmoff < $imoff + 1000,
+ "Latest checkpoint's NextMultiOffset ($nextmoff) is close to the initial multiOffset ($imoff).");
+
+# Run pgbench to check whether the database is working properly
+$node->command_ok(
+ [ qw(pgbench --initialize --no-vacuum --scale=10) ],
+ 'pgbench finished without errors');
+
+done_testing();
\ No newline at end of file
--
2.43.0
view thread (21+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: POC: make mxidoff 64 bits
In-Reply-To: <CACG=ezYThNkf8QsDA-aQfEFEkqn2L=_uUL83z0vJstPRasbZqg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox