public inbox for [email protected]
help / color / mirror / Atom feedRe: POC: make mxidoff 64 bits
21+ messages / 6 participants
[nested] [flat]
* Re: POC: make mxidoff 64 bits
@ 2024-04-25 14:20 Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-04-25 14:20 UTC (permalink / raw)
To: wenhui qiu <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
On Tue, 23 Apr 2024 at 12:37, Heikki Linnakangas <[email protected]> wrote:
> This is really a bug fix. It didn't matter when TransactionId and
> MultiXactOffset were both typedefs of uint32, but it was always wrong.
> The argument name 'xid' is also misleading.
>
> I think there are some more like that, MXOffsetToFlagsBitShift for example.
Yeah, I always thought so too. I believe, this is just a copy-paste. You
mean, it is worth creating a separate CF
entry for these fixes?
On Tue, 23 Apr 2024 at 16:03, Andrey M. Borodin <[email protected]>
wrote:
> BTW as a side note... I see lot's of casts to (unsigned long long), can't
> we just cast to MultiXactOffset?
>
Actually, first versions of the 64xid patch set have such a cast to types
TransactionID, MultiXact and so on. But,
after some discussions, we are switched to unsigned long long cast.
Unfortunately, I could not find an exact link
for that discussion. On the other hand, such a casting is already used
throughout the code. So, just for the
sake of the consistency, I would like to stay with these casts.
On Tue, 23 Apr 2024 at 16:03, wenhui qiu <[email protected]> wrote:
> Hi Maxim Orlov
> Thank you so much for your tireless work on this. Increasing the WAL
> size by a few bytes should have very little impact with today's disk
> performance(Logical replication of this feature wal log is also increased a
> lot, logical replication is a milestone new feature, and the community has
> been improving the logical replication of functions),I believe removing
> troubled postgresql Transaction ID Wraparound was also a milestone new
> feature adding a few bytes is worth it!
>
I'm 100% agree. Maybe, I should return to this approach and find some
benefits for having FXIDs in WAL.
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-08-14 15:30 ` Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-08-14 15:30 UTC (permalink / raw)
To: wenhui qiu <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
Hi!
Sorry for delay. I was a bit busy last month. Anyway, here is my
proposal for making multioffsets 64 bit.
The patch set consists of three parts:
0001 - making user output of offsets 64-bit ready;
0002 - making offsets 64-bit;
0003 - provide 32 to 64 bit conversion in pg_upgarde.
I'm pretty sure this is just a beginning of the conversation, so any
opinions and reviews, as always, are very welcome!
--
Best regards,
Maxim Orlov.
Attachments:
[application/x-patch] v1-0002-Use-64-bit-multixact-offsets.patch (14.3K, 2-v1-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 2e1f05b3b0504153e57188e968bb19cb6741c087 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v1 2/3] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 182 ++-----------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 16 insertions(+), 174 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 57c5148933..f2a2aa9547 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -95,14 +95,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -174,7 +166,7 @@ MXOffsetToMemberPage(MultiXactOffset offset)
return offset / MULTIXACT_MEMBERS_PER_PAGE;
}
-static inline int
+static inline int64
MXOffsetToMemberSegment(MultiXactOffset offset)
{
return MXOffsetToMemberPage(offset) / SLRU_PAGES_PER_SEGMENT;
@@ -271,9 +263,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -408,8 +397,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1158,78 +1145,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1968,7 +1883,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2713,8 +2628,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2729,7 +2642,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2760,11 +2672,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2777,24 +2685,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2804,14 +2695,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2821,54 +2710,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2990,8 +2831,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3041,10 +2883,10 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int64 segpage, void *data
static void
PerformMembersTruncation(MultiXactOffset oldestOffset, MultiXactOffset newOldestOffset)
{
- const int maxsegment = MXOffsetToMemberSegment(MaxMultiXactOffset);
- int startsegment = MXOffsetToMemberSegment(oldestOffset);
- int endsegment = MXOffsetToMemberSegment(newOldestOffset);
- int segment = startsegment;
+ const int64 maxsegment = MXOffsetToMemberSegment(MaxMultiXactOffset);
+ int64 startsegment = MXOffsetToMemberSegment(oldestOffset);
+ int64 endsegment = MXOffsetToMemberSegment(newOldestOffset);
+ int64 segment = startsegment;
/*
* Delete all the segments but the last one. The last segment can still
@@ -3337,7 +3179,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index dc1841346c..ccfb82b478 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -661,7 +661,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.45.2
[application/x-patch] v1-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 3-v1-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From 95226756a225ca6b95e2baafff502034c355310d Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v1 1/3] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c601ff98a1..57c5148933 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1258,7 +1258,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2285,8 +2286,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2320,8 +2322,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2511,8 +2513,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3203,11 +3205,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3463,11 +3466,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index ad817fbca6..388037a94b 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -877,8 +877,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.45.2
[application/x-patch] v1-0003-Make-pg_upgrade-convert-multixact-offsets.patch (12.9K, 4-v1-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From 063ec2662d94f7a72e3162702c4051f34cd67000 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v1 3/3] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 29 ++-
src/bin/pg_upgrade/pg_upgrade.h | 13 +-
src/bin/pg_upgrade/segresize.c | 350 +++++++++++++++++++++++++++++++
src/include/catalog/catversion.h | 2 +-
6 files changed, 391 insertions(+), 5 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..030816596f 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 9825fa3305..2d9f7e6b65 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..d9d8d0ea78 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,7 +750,30 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ uint64 oldest_offset = convert_multixact_offsets();
+
+ if (oldest_offset)
+ {
+ uint64 next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+
+ /* Handle possible wraparound. */
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+
copy_subdir_files("pg_multixact/members", "pg_multixact/members");
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +783,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index cdb6e2b759..37d173cb86 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202408123
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -494,3 +501,7 @@ void parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr
char *old_pgdata, char *new_pgdata,
char *old_tablespace);
bool reap_child(bool wait_for_child);
+
+/* segresize.c */
+
+uint64 convert_multixact_offsets(void);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..e47c0a2407
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,350 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+ bool long_segment_names;
+} SlruSegState;
+
+/*
+ * Get SLRU segmen file name from state.
+ *
+ * NOTE: this function should mirror SlruFileName call.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ if (state->long_segment_names)
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+ return psprintf("%s/%015llX", state->dir, (long long) state->segno);
+ }
+ else
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+ }
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ state->pageno++;
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ state->segno++;
+ state->pageno = 0;
+ close_segment(state);
+ }
+}
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+uint64
+convert_multixact_offsets(void)
+{
+ /* See multixact.c */
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(uint32))
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
+
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ uint32 oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE] = {0};
+ /*
+ * It is much easier to deal with multi wraparound in 64 bitd format. Thus
+ * we use 64 bits for multi-transactions, although they remain 32 bits.
+ */
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool found = false;
+ uint64 oldest_offset = 0;
+
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ oldseg.long_segment_names = false; /* old format XXXX */
+
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+ newseg.long_segment_names = true;
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound. */
+ if (oldseg.segno > MaxMultiXactId /
+ MULTIXACT_OFFSETS_PER_PAGE_OLD /
+ SLRU_PAGES_PER_SEGMENT)
+ oldseg.segno = 0;
+
+ /* Read old offset segment. */
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (oldlen <= 0 || empty)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Fill possible gap. */
+ if (oldlen < BLCKSZ)
+ memset((char *) oldbuf + oldlen, 0, BLCKSZ - oldlen);
+
+ /* Save oldest multi offset */
+ if (!found)
+ {
+ oldest_offset = oldbuf[old_entry];
+ found = true;
+ }
+
+ /* ... skip wrapped-around invalid multi */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page. */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound. */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1. */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE)
+ {
+ /* Write a new page. */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page. */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Release resources. */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ check_ok();
+
+ return oldest_offset;
+}
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 9a0ae27823..f29dc9fc92 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202408122
+#define CATALOG_VERSION_NO 202408123
#endif
--
2.45.2
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-09-03 13:30 ` Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-09-03 13:30 UTC (permalink / raw)
To: wenhui qiu <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
Here is rebase. Apparently I'll have to do it often, since the
CATALOG_VERSION_NO changed in the patch.
--
Best regards,
Maxim Orlov.
Attachments:
[application/octet-stream] v2-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 3-v2-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From 228a21532bb441fe582a66b7404962ce5bf4b18b Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v2 1/3] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 178491f6f5..0c5980a436 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -877,8 +877,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.45.2
[application/octet-stream] v2-0003-Make-pg_upgrade-convert-multixact-offsets.patch (12.9K, 4-v2-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From a06557435597868cc654de2899b6cd618fed641c Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v2 3/3] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 29 ++-
src/bin/pg_upgrade/pg_upgrade.h | 13 +-
src/bin/pg_upgrade/segresize.c | 350 +++++++++++++++++++++++++++++++
src/include/catalog/catversion.h | 2 +-
6 files changed, 391 insertions(+), 5 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..030816596f 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 9825fa3305..2d9f7e6b65 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..d9d8d0ea78 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,7 +750,30 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ uint64 oldest_offset = convert_multixact_offsets();
+
+ if (oldest_offset)
+ {
+ uint64 next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+
+ /* Handle possible wraparound. */
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+
copy_subdir_files("pg_multixact/members", "pg_multixact/members");
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +783,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index cdb6e2b759..445b46e5bd 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202408302
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -494,3 +501,7 @@ void parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr
char *old_pgdata, char *new_pgdata,
char *old_tablespace);
bool reap_child(bool wait_for_child);
+
+/* segresize.c */
+
+uint64 convert_multixact_offsets(void);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..e47c0a2407
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,350 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+ bool long_segment_names;
+} SlruSegState;
+
+/*
+ * Get SLRU segmen file name from state.
+ *
+ * NOTE: this function should mirror SlruFileName call.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ if (state->long_segment_names)
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+ return psprintf("%s/%015llX", state->dir, (long long) state->segno);
+ }
+ else
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+ }
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ state->pageno++;
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ state->segno++;
+ state->pageno = 0;
+ close_segment(state);
+ }
+}
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+uint64
+convert_multixact_offsets(void)
+{
+ /* See multixact.c */
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(uint32))
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
+
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ uint32 oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE] = {0};
+ /*
+ * It is much easier to deal with multi wraparound in 64 bitd format. Thus
+ * we use 64 bits for multi-transactions, although they remain 32 bits.
+ */
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool found = false;
+ uint64 oldest_offset = 0;
+
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ oldseg.long_segment_names = false; /* old format XXXX */
+
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+ newseg.long_segment_names = true;
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound. */
+ if (oldseg.segno > MaxMultiXactId /
+ MULTIXACT_OFFSETS_PER_PAGE_OLD /
+ SLRU_PAGES_PER_SEGMENT)
+ oldseg.segno = 0;
+
+ /* Read old offset segment. */
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (oldlen <= 0 || empty)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Fill possible gap. */
+ if (oldlen < BLCKSZ)
+ memset((char *) oldbuf + oldlen, 0, BLCKSZ - oldlen);
+
+ /* Save oldest multi offset */
+ if (!found)
+ {
+ oldest_offset = oldbuf[old_entry];
+ found = true;
+ }
+
+ /* ... skip wrapped-around invalid multi */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page. */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound. */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1. */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE)
+ {
+ /* Write a new page. */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page. */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Release resources. */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ check_ok();
+
+ return oldest_offset;
+}
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 1980d492c3..7b1cd22d1a 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202408301
+#define CATALOG_VERSION_NO 202408302
#endif
--
2.45.2
[application/octet-stream] v2-0002-Use-64-bit-multixact-offsets.patch (13.3K, 5-v2-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 3afd483c1a2a505e14603da759adcefd7130fff9 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v2 2/3] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index dc1841346c..ccfb82b478 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -661,7 +661,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.45.2
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-09-03 13:32 ` Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Alexander Korotkov @ 2024-09-03 13:32 UTC (permalink / raw)
To: Maxim Orlov <[email protected]>; +Cc: wenhui qiu <[email protected]>; Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
On Tue, Sep 3, 2024 at 4:30 PM Maxim Orlov <[email protected]> wrote:
> Here is rebase. Apparently I'll have to do it often, since the CATALOG_VERSION_NO changed in the patch.
I don't think you need to maintain CATALOG_VERSION_NO change in your
patch for the exact reason you have mentioned: patch will get conflict
each time CATALOG_VERSION_NO is advanced. It's responsibility of
committer to advance CATALOG_VERSION_NO when needed.
------
Regards,
Alexander Korotkov
Supabase
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
@ 2024-09-04 08:49 ` Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-09-04 08:49 UTC (permalink / raw)
To: Alexander Korotkov <[email protected]>; +Cc: wenhui qiu <[email protected]>; Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
On Tue, 3 Sept 2024 at 16:32, Alexander Korotkov <[email protected]>
wrote:
> I don't think you need to maintain CATALOG_VERSION_NO change in your
> patch for the exact reason you have mentioned: patch will get conflict
> each time CATALOG_VERSION_NO is advanced. It's responsibility of
> committer to advance CATALOG_VERSION_NO when needed.
>
OK, I got it. My intention here was to help to test the patch. If someone
wants to have a
look at the patch, he won't need to make changes in the code. In the next
iteration, I'll
remove CATALOG_VERSION_NO version change.
--
Best regards,
Maxim Orlov.
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-09-07 04:36 ` Maxim Orlov <[email protected]>
2024-09-12 12:09 ` Re: POC: make mxidoff 64 bits Pavel Borisov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
0 siblings, 2 replies; 21+ messages in thread
From: Maxim Orlov @ 2024-09-07 04:36 UTC (permalink / raw)
To: Alexander Korotkov <[email protected]>; +Cc: wenhui qiu <[email protected]>; Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
Here is v3. I removed CATALOG_VERSION_NO change, so this should be done by
the actual commiter.
--
Best regards,
Maxim Orlov.
Attachments:
[application/octet-stream] v3-0002-Use-64-bit-multixact-offsets.patch (13.3K, 3-v3-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 231886c2fafe9eb2d8535c4b590e387085d7aec7 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v3 2/3] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index dc1841346c..ccfb82b478 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -661,7 +661,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.45.2
[application/octet-stream] v3-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 4-v3-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From cc588f091a2c1970849a6e341ca1a8a79fc1a935 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v3 1/3] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 178491f6f5..0c5980a436 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -877,8 +877,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.45.2
[application/octet-stream] v3-0003-Make-pg_upgrade-convert-multixact-offsets.patch (12.6K, 5-v3-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From 78cba2fcfbe11451ec6b8cd6e4c48b315571ab0d Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v3 3/3] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 29 ++-
src/bin/pg_upgrade/pg_upgrade.h | 13 +-
src/bin/pg_upgrade/segresize.c | 350 ++++++++++++++++++++++++++++++++
5 files changed, 390 insertions(+), 4 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..030816596f 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 9825fa3305..2d9f7e6b65 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..d9d8d0ea78 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,7 +750,30 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ uint64 oldest_offset = convert_multixact_offsets();
+
+ if (oldest_offset)
+ {
+ uint64 next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+
+ /* Handle possible wraparound. */
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+
copy_subdir_files("pg_multixact/members", "pg_multixact/members");
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +783,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index cdb6e2b759..157e59e38f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -494,3 +501,7 @@ void parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr
char *old_pgdata, char *new_pgdata,
char *old_tablespace);
bool reap_child(bool wait_for_child);
+
+/* segresize.c */
+
+uint64 convert_multixact_offsets(void);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..e47c0a2407
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,350 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+ bool long_segment_names;
+} SlruSegState;
+
+/*
+ * Get SLRU segmen file name from state.
+ *
+ * NOTE: this function should mirror SlruFileName call.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ if (state->long_segment_names)
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+ return psprintf("%s/%015llX", state->dir, (long long) state->segno);
+ }
+ else
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+ }
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ state->pageno++;
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ state->segno++;
+ state->pageno = 0;
+ close_segment(state);
+ }
+}
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+uint64
+convert_multixact_offsets(void)
+{
+ /* See multixact.c */
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(uint32))
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
+
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ uint32 oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE] = {0};
+ /*
+ * It is much easier to deal with multi wraparound in 64 bitd format. Thus
+ * we use 64 bits for multi-transactions, although they remain 32 bits.
+ */
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool found = false;
+ uint64 oldest_offset = 0;
+
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ oldseg.long_segment_names = false; /* old format XXXX */
+
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+ newseg.long_segment_names = true;
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound. */
+ if (oldseg.segno > MaxMultiXactId /
+ MULTIXACT_OFFSETS_PER_PAGE_OLD /
+ SLRU_PAGES_PER_SEGMENT)
+ oldseg.segno = 0;
+
+ /* Read old offset segment. */
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (oldlen <= 0 || empty)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Fill possible gap. */
+ if (oldlen < BLCKSZ)
+ memset((char *) oldbuf + oldlen, 0, BLCKSZ - oldlen);
+
+ /* Save oldest multi offset */
+ if (!found)
+ {
+ oldest_offset = oldbuf[old_entry];
+ found = true;
+ }
+
+ /* ... skip wrapped-around invalid multi */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page. */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound. */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1. */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE)
+ {
+ /* Write a new page. */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page. */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Release resources. */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ check_ok();
+
+ return oldest_offset;
+}
--
2.45.2
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-09-12 12:09 ` Pavel Borisov <[email protected]>
2024-09-12 12:25 ` Re: POC: make mxidoff 64 bits Pavel Borisov <[email protected]>
2024-09-12 13:14 ` Re: POC: make mxidoff 64 bits Alvaro Herrera <[email protected]>
1 sibling, 2 replies; 21+ messages in thread
From: Pavel Borisov @ 2024-09-12 12:09 UTC (permalink / raw)
To: Maxim Orlov <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; wenhui qiu <[email protected]>; Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
Hi, Maxim!
Previously we accessed offsets in shared MultiXactState without locks as
32-bit read is always atomic. But I'm not sure it's so when offset become
64-bit.
E.g. GetNewMultiXactId():
nextOffset = MultiXactState->nextOffset;
is outside lock.
There might be other places we do the same as well.
Regards,
Pavel Borisov
Supabase
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-12 12:09 ` Re: POC: make mxidoff 64 bits Pavel Borisov <[email protected]>
@ 2024-09-12 12:25 ` Pavel Borisov <[email protected]>
1 sibling, 0 replies; 21+ messages in thread
From: Pavel Borisov @ 2024-09-12 12:25 UTC (permalink / raw)
To: Maxim Orlov <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; wenhui qiu <[email protected]>; Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
On Thu, 12 Sept 2024 at 16:09, Pavel Borisov <[email protected]> wrote:
> Hi, Maxim!
>
> Previously we accessed offsets in shared MultiXactState without locks as
> 32-bit read is always atomic. But I'm not sure it's so when offset become
> 64-bit.
> E.g. GetNewMultiXactId():
>
> nextOffset = MultiXactState->nextOffset;
> is outside lock.
>
> There might be other places we do the same as well.
>
I think the replacement of plain assignments by
pg_atomic_read_u64/pg_atomic_write_u64 would be sufficient.
(The same I think is needed for the patchset [1])
[1]
https://www.postgresql.org/message-id/flat/CAJ7c6TMvPz8q+nC=JoKniy7yxPzQYcCTnNFYmsDP-nnWsAOJ2g@mail....
Regards,
Pavel Borisov
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-12 12:09 ` Re: POC: make mxidoff 64 bits Pavel Borisov <[email protected]>
@ 2024-09-12 13:14 ` Alvaro Herrera <[email protected]>
1 sibling, 0 replies; 21+ messages in thread
From: Alvaro Herrera @ 2024-09-12 13:14 UTC (permalink / raw)
To: Pavel Borisov <[email protected]>; +Cc: Maxim Orlov <[email protected]>; Alexander Korotkov <[email protected]>; wenhui qiu <[email protected]>; Heikki Linnakangas <[email protected]>; Postgres hackers <[email protected]>
On 2024-Sep-12, Pavel Borisov wrote:
> Hi, Maxim!
>
> Previously we accessed offsets in shared MultiXactState without locks as
> 32-bit read is always atomic. But I'm not sure it's so when offset become
> 64-bit.
> E.g. GetNewMultiXactId():
>
> nextOffset = MultiXactState->nextOffset;
> is outside lock.
Good though. But fortunately I think it's not a problem. The one you
say is with MultiXactGetLock held in shared mode -- and that works OK,
as the assignment (in line 1263 at the bottom of the same routine) is
done with exclusive lock held.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-10-22 09:43 ` Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
1 sibling, 1 reply; 21+ messages in thread
From: Heikki Linnakangas @ 2024-10-22 09:43 UTC (permalink / raw)
To: Maxim Orlov <[email protected]>; Alexander Korotkov <[email protected]>; +Cc: wenhui qiu <[email protected]>; Postgres hackers <[email protected]>
On 07/09/2024 07:36, Maxim Orlov wrote:
> Here is v3.
MultiXactMemberFreezeThreshold looks quite bogus now. Now that
MaxMultiXactOffset==2^64-1, you cannot get anywhere near the
MULTIXACT_MEMBER_SAFE_THRESHOLD and MULTIXACT_MEMBER_DANGER_THRESHOLD
values anymore. Can we just get rid of MultiXactMemberFreezeThreshold? I
guess it would still be useful to trigger autovacuum if multixacts
members grows large though, to release the disk space, even if you can't
run out of members as such anymore. What should the logic for that look
like?
I'd love to see some tests for the pg_upgrade code. Something like a
little perl script to generate test clusters with different wraparound
scenarios etc. using the old version, and a TAP test to run pg_upgrade
on them and verify that queries on the upgraded cluster works correctly.
We don't have tests like that in the repository today, and I don't know
if we'd want to commit these permanently either, but it would be highly
useful now as a one-off thing, to show that the code works.
On upgrade, are there really no changes required to
pg_multixact/members? I imagined that the segment files would need to be
renamed around wraparound, so that if you previously had files like this:
pg_multixact/members/FFFE
pg_multixact/members/FFFF
pg_multixact/members/0000
pg_multixact/members/0001
after upgrade you would need to have:
pg_multixact/members/00000000FFFE
pg_multixact/members/00000000FFFF
pg_multixact/members/000000010000
pg_multixact/members/000000010001
Thanks for working on this!
--
Heikki Linnakangas
Neon (https://neon.tech)
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
@ 2024-10-22 16:33 ` Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-10-22 16:33 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; wenhui qiu <[email protected]>; Postgres hackers <[email protected]>
On Tue, 22 Oct 2024 at 12:43, Heikki Linnakangas <[email protected]> wrote:
> MultiXactMemberFreezeThreshold looks quite bogus now. Now that
> MaxMultiXactOffset==2^64-1, you cannot get anywhere near the
> MULTIXACT_MEMBER_SAFE_THRESHOLD and MULTIXACT_MEMBER_DANGER_THRESHOLD
> values anymore. Can we just get rid of MultiXactMemberFreezeThreshold? I
> guess it would still be useful to trigger autovacuum if multixacts
> members grows large though, to release the disk space, even if you can't
> run out of members as such anymore. What should the logic for that look
> like?
>
Yep, you're totally correct. The MultiXactMemberFreezeThreshold call is not
necessary any more and can be safely removed.
I made this as a separate commit in v4. But, as you rightly say, it will be
useful to trigger autovacuum in some cases. The obvious
place for this machinery is in the GetNewMultiXactId. I imagine this like
"if nextOff - oldestOff > threshold kick autovac". So, the
question is: what kind of threshold we want here? Is it a hard coded define
or GUC? If it is a GUC (32–bit), what values should it be?
And the other issue I feel a little regretful about. We still must be
holding MultiXactGenLock in order to track oldestOffset to do
"nextOff - oldestOff" calculation.
>
> I'd love to see some tests for the pg_upgrade code. Something like a
> little perl script to generate test clusters with different wraparound
> scenarios etc.
Agree. I'll address this as soon as I can.
--
Best regards,
Maxim Orlov.
Attachments:
[application/octet-stream] v4-0004-Make-pg_upgrade-convert-multixact-offsets.patch (12.4K, 3-v4-0004-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From 7432d8bd1fb2343bd873a21ba757c115d8a2dd59 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v4 4/4] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 29 ++-
src/bin/pg_upgrade/pg_upgrade.h | 13 +-
src/bin/pg_upgrade/segresize.c | 350 ++++++++++++++++++++++++++++++++
5 files changed, 390 insertions(+), 4 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..d9d8d0ea78 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,7 +750,30 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ uint64 oldest_offset = convert_multixact_offsets();
+
+ if (oldest_offset)
+ {
+ uint64 next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+
+ /* Handle possible wraparound. */
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+
copy_subdir_files("pg_multixact/members", "pg_multixact/members");
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +783,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..4d65e4125e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,7 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+uint64 convert_multixact_offsets(void);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..e47c0a2407
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,350 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+ bool long_segment_names;
+} SlruSegState;
+
+/*
+ * Get SLRU segmen file name from state.
+ *
+ * NOTE: this function should mirror SlruFileName call.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ if (state->long_segment_names)
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+ return psprintf("%s/%015llX", state->dir, (long long) state->segno);
+ }
+ else
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+ }
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ state->pageno++;
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ state->segno++;
+ state->pageno = 0;
+ close_segment(state);
+ }
+}
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+uint64
+convert_multixact_offsets(void)
+{
+ /* See multixact.c */
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(uint32))
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
+
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ uint32 oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE] = {0};
+ /*
+ * It is much easier to deal with multi wraparound in 64 bitd format. Thus
+ * we use 64 bits for multi-transactions, although they remain 32 bits.
+ */
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool found = false;
+ uint64 oldest_offset = 0;
+
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ oldseg.long_segment_names = false; /* old format XXXX */
+
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+ newseg.long_segment_names = true;
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound. */
+ if (oldseg.segno > MaxMultiXactId /
+ MULTIXACT_OFFSETS_PER_PAGE_OLD /
+ SLRU_PAGES_PER_SEGMENT)
+ oldseg.segno = 0;
+
+ /* Read old offset segment. */
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (oldlen <= 0 || empty)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Fill possible gap. */
+ if (oldlen < BLCKSZ)
+ memset((char *) oldbuf + oldlen, 0, BLCKSZ - oldlen);
+
+ /* Save oldest multi offset */
+ if (!found)
+ {
+ oldest_offset = oldbuf[old_entry];
+ found = true;
+ }
+
+ /* ... skip wrapped-around invalid multi */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page. */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound. */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1. */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE)
+ {
+ /* Write a new page. */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page. */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Release resources. */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ check_ok();
+
+ return oldest_offset;
+}
--
2.43.0
[application/octet-stream] v4-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (13.4K, 4-v4-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From 87978b0164a785d5758c99d892ff0c20e216769c Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 22 Oct 2024 18:53:18 +0300
Subject: [PATCH v4 3/4] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 218 +------------------------
src/backend/commands/vacuum.c | 5 +-
src/backend/postmaster/autovacuum.c | 7 +-
src/include/access/multixact.h | 1 -
4 files changed, 7 insertions(+), 224 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c51e03e832..fc7d2cef70 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -250,14 +250,6 @@ typedef struct MultiXactStateData
MultiXactId oldestMultiXactId;
Oid oldestMultiXactDB;
- /*
- * Oldest multixact offset that is potentially referenced by a multixact
- * referenced by a relation. We don't always know this value, so there's
- * a flag here to indicate whether or not we currently do.
- */
- MultiXactOffset oldestOffset;
- bool oldestOffsetKnown;
-
/* support for anti-wraparound measures */
MultiXactId multiVacLimit;
MultiXactId multiWarnLimit;
@@ -398,7 +390,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
static void WriteMTruncateXlogRec(Oid oldestMultiDB,
@@ -2284,16 +2275,13 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
MultiXactId curMulti;
- bool needs_offset_vacuum;
Assert(MultiXactIdIsValid(oldest_datminmxid));
/*
* We pretend that a wrap will happen halfway through the multixact ID
* space, but that's not really true, because multixacts wrap differently
- * from transaction IDs. Note that, separately from any concern about
- * multixact IDs wrapping, we must ensure that multixact members do not
- * wrap. Limits for that are set in SetOffsetVacuumLimit, not here.
+ * from transaction IDs.
*/
multiWrapLimit = oldest_datminmxid + (MaxMultiXactId >> 1);
if (multiWrapLimit < FirstMultiXactId)
@@ -2361,9 +2349,6 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
Assert(!InRecovery);
- /* Set limits for offset vacuum. */
- needs_offset_vacuum = SetOffsetVacuumLimit(is_startup);
-
/*
* If past the autovacuum force point, immediately signal an autovac
* request. The reason for this is that autovac only processes one
@@ -2371,8 +2356,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
* database, it'll call here, and we'll signal the postmaster to start
* another iteration immediately if there are still any old databases.
*/
- if ((MultiXactIdPrecedes(multiVacLimit, curMulti) ||
- needs_offset_vacuum) && IsUnderPostmaster)
+ if (MultiXactIdPrecedes(multiVacLimit, curMulti) && IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
/* Give an immediate warning if past the wrap warn point */
@@ -2615,109 +2599,6 @@ GetOldestMultiXactId(void)
return oldestMXact;
}
-/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
- *
- * To do so determine what's the oldest member offset and install the limit
- * info in MultiXactState, where it can be used to prevent overrun of old data
- * in the members SLRU area.
- *
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
- */
-static bool
-SetOffsetVacuumLimit(bool is_startup)
-{
- MultiXactId oldestMultiXactId;
- MultiXactId nextMXact;
- MultiXactOffset oldestOffset = 0; /* placate compiler */
- MultiXactOffset prevOldestOffset;
- MultiXactOffset nextOffset;
- bool oldestOffsetKnown = false;
- bool prevOldestOffsetKnown;
-
- /*
- * NB: Have to prevent concurrent truncation, we might otherwise try to
- * lookup an oldestMulti that's concurrently getting truncated away.
- */
- LWLockAcquire(MultiXactTruncationLock, LW_SHARED);
-
- /* Read relevant fields from shared memory. */
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMXact = MultiXactState->nextMXact;
- nextOffset = MultiXactState->nextOffset;
- prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- prevOldestOffset = MultiXactState->oldestOffset;
- Assert(MultiXactState->finishedStartup);
- LWLockRelease(MultiXactGenLock);
-
- /*
- * Determine the offset of the oldest multixact. Normally, we can read
- * the offset from the multixact itself, but there's an important special
- * case: if there are no multixacts in existence at all, oldestMXact
- * obviously can't point to one. It will instead point to the multixact
- * ID that will be assigned the next time one is needed.
- */
- if (oldestMultiXactId == nextMXact)
- {
- /*
- * When the next multixact gets created, it will be stored at the next
- * offset.
- */
- oldestOffset = nextOffset;
- oldestOffsetKnown = true;
- }
- else
- {
- /*
- * Figure out where the oldest existing multixact's offsets are
- * stored. Due to bugs in early release of PostgreSQL 9.3.X and 9.4.X,
- * the supposedly-earliest multixact might not really exist. We are
- * careful not to fail in that case.
- */
- oldestOffsetKnown =
- find_multixact_start(oldestMultiXactId, &oldestOffset);
-
- if (!oldestOffsetKnown)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
- oldestMultiXactId)));
- }
-
- LWLockRelease(MultiXactTruncationLock);
-
- /*
- * If we can, compute limits (and install them MultiXactState) to prevent
- * overrun of old data in the members SLRU area. We can only do so if the
- * oldest offset is known though.
- */
- if (prevOldestOffsetKnown)
- {
- /*
- * If we failed to get the oldest offset this time, but we have a
- * value from a previous pass through this function, use the old
- * values rather than automatically forcing an emergency autovacuum
- * cycle again.
- */
- oldestOffset = prevOldestOffset;
- oldestOffsetKnown = true;
- }
-
- /* Install the computed values */
- LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
- MultiXactState->oldestOffset = oldestOffset;
- MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- /*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
- */
- return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2761,101 +2642,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ac8f5d9c25..97dd6bc8e2 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1131,10 +1131,9 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
/*
* Also compute the multixact age for which freezing is urgent. This is
- * normally autovacuum_multixact_freeze_max_age, but may be less if we are
- * short of multixact member space.
+ * autovacuum_multixact_freeze_max_age.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..e9285ba44c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1912,10 +1912,9 @@ do_autovacuum(void)
/*
* Compute the multixact age for which freezing is urgent. This is
- * normally autovacuum_multixact_freeze_max_age, but may be less if we are
- * short of multixact member space.
+ * autovacuum_multixact_freeze_max_age.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v4-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 5-v4-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From 5de021e3b012dbf71bb6b2893cd77864236bffcb Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v4 1/4] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 320b14add1..4846126ef9 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -877,8 +877,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.43.0
[application/octet-stream] v4-0002-Use-64-bit-multixact-offsets.patch (13.3K, 6-v4-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From cca60a5e487090252dddd515c716272786841c5e Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v4 2/4] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 55dec71a6d..556fffa333 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -652,7 +652,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-10-23 15:55 ` Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-10-23 15:55 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; wenhui qiu <[email protected]>; Postgres hackers <[email protected]>
After a bit of thought, I've realized that to be conservative here is the
way to go.
We can reuse a maximum of existing logic. I mean, we can remove offset
wraparound "error logic" and reuse "warning logic". But set the threshold
for "warning logic" to a much higher value. For now, I choose 2^32-1. In
other world, legit logic, in my view, here would be to trigger autovacuum
if the number of offsets (i.e. difference nextOffset - oldestOffset)
exceeds 2^32-1. PFA patch set.
--
Best regards,
Maxim Orlov.
Attachments:
[application/octet-stream] v5-0002-Use-64-bit-multixact-offsets.patch (13.3K, 3-v5-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From cca60a5e487090252dddd515c716272786841c5e Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v5 2/4] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 55dec71a6d..556fffa333 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -652,7 +652,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
[application/octet-stream] v5-0003-Make-pg_upgrade-convert-multixact-offsets.patch (12.4K, 4-v5-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From 6da93c5db43d5f8c340cc45e47bc73752f16c72c Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v5 3/4] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 29 ++-
src/bin/pg_upgrade/pg_upgrade.h | 13 +-
src/bin/pg_upgrade/segresize.c | 350 ++++++++++++++++++++++++++++++++
5 files changed, 390 insertions(+), 4 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..d9d8d0ea78 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,7 +750,30 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ uint64 oldest_offset = convert_multixact_offsets();
+
+ if (oldest_offset)
+ {
+ uint64 next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+
+ /* Handle possible wraparound. */
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+
copy_subdir_files("pg_multixact/members", "pg_multixact/members");
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +783,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..4d65e4125e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,7 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+uint64 convert_multixact_offsets(void);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..e47c0a2407
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,350 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+ bool long_segment_names;
+} SlruSegState;
+
+/*
+ * Get SLRU segmen file name from state.
+ *
+ * NOTE: this function should mirror SlruFileName call.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ if (state->long_segment_names)
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+ return psprintf("%s/%015llX", state->dir, (long long) state->segno);
+ }
+ else
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+ }
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ state->pageno++;
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ state->segno++;
+ state->pageno = 0;
+ close_segment(state);
+ }
+}
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+uint64
+convert_multixact_offsets(void)
+{
+ /* See multixact.c */
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(uint32))
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
+
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ uint32 oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE] = {0};
+ /*
+ * It is much easier to deal with multi wraparound in 64 bitd format. Thus
+ * we use 64 bits for multi-transactions, although they remain 32 bits.
+ */
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool found = false;
+ uint64 oldest_offset = 0;
+
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ oldseg.long_segment_names = false; /* old format XXXX */
+
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+ newseg.long_segment_names = true;
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound. */
+ if (oldseg.segno > MaxMultiXactId /
+ MULTIXACT_OFFSETS_PER_PAGE_OLD /
+ SLRU_PAGES_PER_SEGMENT)
+ oldseg.segno = 0;
+
+ /* Read old offset segment. */
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (oldlen <= 0 || empty)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Fill possible gap. */
+ if (oldlen < BLCKSZ)
+ memset((char *) oldbuf + oldlen, 0, BLCKSZ - oldlen);
+
+ /* Save oldest multi offset */
+ if (!found)
+ {
+ oldest_offset = oldbuf[old_entry];
+ found = true;
+ }
+
+ /* ... skip wrapped-around invalid multi */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page. */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound. */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1. */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE)
+ {
+ /* Write a new page. */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page. */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Release resources. */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ check_ok();
+
+ return oldest_offset;
+}
--
2.43.0
[application/octet-stream] v5-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (8.8K, 5-v5-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From 95c4613092e4884fb2162624c4fb1dcf5f94c6f6 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 23 Oct 2024 18:23:39 +0300
Subject: [PATCH v5 4/4] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Instead, switch to MULTIXACT_MEMBER_AUTOVAC_THRESHOLD (eq 2^32) members
threshold. It is used to determine if we need to force autovacuum or not.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 116 +++----------------------
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/include/access/multixact.h | 1 -
4 files changed, 14 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c51e03e832..7f12217309 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -204,10 +204,13 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -2616,15 +2619,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2712,10 +2713,10 @@ SetOffsetVacuumLimit(bool is_startup)
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2761,101 +2762,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ac8f5d9c25..b04d864095 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1134,7 +1134,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..180bb7e96e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1915,7 +1915,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v5-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 6-v5-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From 5de021e3b012dbf71bb6b2893cd77864236bffcb Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v5 1/4] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 320b14add1..4846126ef9 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -877,8 +877,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.43.0
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-10-25 03:38 ` wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: wenhui qiu @ 2024-10-25 03:38 UTC (permalink / raw)
To: Maxim Orlov <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
HI Maxim Orlov
> After a bit of thought, I've realized that to be conservative here is the
way to go.
>We can reuse a maximum of existing logic. I mean, we can remove offset
wraparound "error logic" and reuse "warning logic". But set the threshold
for "warning >logic" to a much higher value. For now, I choose 2^32-1. In
other world, legit logic, in my view, here would be to trigger autovacuum
if the number of offsets (i.e. >difference nextOffset - oldestOffset)
exceeds 2^32-1. PFA patch set.
good point ,Couldn't agree with you more. xid64 is the solution to the
wraparound problem,The previous error log is no longer meaningful ,But we
might want to refine the output waring log a little(For example, checking
the underlying reasons why age has been increasing),Though we don't have to
worry about xid wraparound
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
Can we refine this annotation a bit? for example
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space ,reduce table
bloat if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
Thanks
Maxim Orlov <[email protected]> 于2024年10月23日周三 23:55写道:
> After a bit of thought, I've realized that to be conservative here is the
> way to go.
>
> We can reuse a maximum of existing logic. I mean, we can remove offset
> wraparound "error logic" and reuse "warning logic". But set the threshold
> for "warning logic" to a much higher value. For now, I choose 2^32-1. In
> other world, legit logic, in my view, here would be to trigger autovacuum
> if the number of offsets (i.e. difference nextOffset - oldestOffset)
> exceeds 2^32-1. PFA patch set.
>
> --
> Best regards,
> Maxim Orlov.
>
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
@ 2024-11-08 18:10 ` Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-11-08 18:10 UTC (permalink / raw)
To: wenhui qiu <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
On Fri, 25 Oct 2024 at 06:39, wenhui qiu <[email protected]> wrote:
>
> + * Multixact members warning threshold.
> + *
> + * If difference bettween nextOffset and oldestOffset exceed this value,
> we
> + * trigger autovacuum in order to release the disk space if possible.
> + */
> +#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
> Can we refine this annotation a bit? for example
>
Thank you, fixed.
Sorry for a late reply. There was a problem in upgrade with offset
wraparound. Here is a fixed version. Test also added. I decide to use my
old patch to set a non-standard multixacts for the old cluster, fill it
with data and do pg_upgrade.
Here is how to test. All the patches are for 14e87ffa5c543b5f3 master
branch.
1) Get the 14e87ffa5c543b5f3 master branch apply patches
0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch and
0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch
2) Get the 14e87ffa5c543b5f3 master branch in a separate directory and
apply v6 patch set.
3) Build two branches.
4) Use ENV oldinstall to run the test: PROVE_TESTS=t/005_mxidoff.pl
oldinstall=/home/orlov/proj/pgsql-new PG_TEST_NOCLEAN=1 make check -C
src/bin/pg_upgrade/
Maybe, I'll make a shell script to automate this steps if required.
--
Best regards,
Maxim Orlov.
Attachments:
[application/octet-stream] v6-0001-Use-64-bit-multixact-offsets.patch (13.3K, 3-v6-0001-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 2a8708fa5d31c6523c7d2654ee1215beda6f1ff0 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v6 1/6] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..e1b3187d0b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -664,7 +664,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
[application/octet-stream] v6-0002-Make-pg_upgrade-convert-multixact-offsets.patch (18.3K, 4-v6-0002-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From a48ec9aaf3de859050dd0ad484dc1fb5f174cf8a Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v6 2/6] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
Author: Yura Sokolov <[email protected]>
---
src/backend/access/transam/multixact.c | 2 +-
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 42 +-
src/bin/pg_upgrade/pg_upgrade.h | 14 +-
src/bin/pg_upgrade/segresize.c | 518 +++++++++++++++++++++++++
6 files changed, 572 insertions(+), 6 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c51e03e832..48e1c0160a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1891,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- true);
+ false);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..1654e877c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,8 +750,42 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
- copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ MultiXactOffset oldest_offset,
+ next_offset;
+
+ remove_new_subdir("pg_multixact/offsets", false);
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+ oldest_offset = convert_multixact_offsets();
+ check_ok();
+
+ remove_new_subdir("pg_multixact/members", false);
+ prep_status("Converting pg_multixact/members");
+ convert_multixact_members(oldest_offset);
+ check_ok();
+
+ next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+ if (oldest_offset)
+ {
+ if (next_offset < oldest_offset)
+ next_offset += ((MultiXactOffset) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ {
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ }
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +794,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..2c85ec1e94 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,8 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+MultiXactOffset convert_multixact_offsets(void);
+void convert_multixact_members(MultiXactOffset oldest_offset);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..ff7ff65758
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,518 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+} SlruSegState;
+
+/*
+ * Get SLRU segmen file name from state.
+ *
+ * NOTE: this function should mirror SlruFileName call.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+
+ return psprintf("%s/%04X", state->dir, (unsigned int) (state->segno));
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+}
+
+typedef uint32 MultiXactOffsetOld;
+
+#define MaxMultiXactOffsetOld ((MultiXactOffsetOld) 0xFFFFFFFF)
+
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(MultiXactOffsetOld))
+#define MULTIXACT_OFFSETS_PER_PAGE_NEW (BLCKSZ / sizeof(MultiXactOffset))
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+MultiXactOffset
+convert_multixact_offsets(void)
+{
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ MultiXactOffsetOld oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE_NEW] = {0},
+ oldest_offset = 0;
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi;
+ uint64 old_entry;
+ uint64 new_entry;
+ bool oldest_offset_known = false;
+
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ /* Copy multi offsets reading only needed segment pages */
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool is_empty;
+
+ /* Handle possible segment wraparound */
+ if (oldseg.segno > MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_OLD / SLRU_PAGES_PER_SEGMENT)
+ oldseg.segno = 0;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &is_empty);
+
+ if (oldlen <= 0 || is_empty)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ if (oldlen < BLCKSZ)
+ memset((char *) oldbuf + oldlen, 0, BLCKSZ - oldlen);
+
+ /* Save oldest multi offset */
+ if (!oldest_offset_known)
+ {
+ oldest_offset = oldbuf[old_entry];
+ oldest_offset_known = true;
+ }
+
+ /* Skip wrapped-around invalid MultiXactIds */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound (1 becomes 2^32) */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1 */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE_NEW)
+ {
+ /* Write new page */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE_NEW - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Use next_offset as oldest_offset, if oldest_multi == next_multi */
+ if (!oldest_offset_known)
+ {
+ Assert(oldest_multi == next_multi);
+ oldest_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ }
+
+ /* Release resources */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ return oldest_offset;
+}
+
+#define MXACT_MEMBERS_FLAG_BYTES 1
+
+#define MULTIXACT_MEMBERS_PER_GROUP 4
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (MULTIXACT_MEMBERS_PER_GROUP * (sizeof(TransactionId) + MXACT_MEMBERS_FLAG_BYTES))
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE \
+ (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERS_PER_GROUP * MULTIXACT_MEMBERGROUPS_PER_PAGE)
+#define MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP \
+ (MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP)
+
+typedef struct MultiXactMembersCtx
+{
+ SlruSegState seg;
+ char buf[BLCKSZ];
+ int group;
+ int member;
+ char *flag;
+ TransactionId *xid;
+} MultiXactMembersCtx;
+
+static void
+MultiXactMembersCtxInit(MultiXactMembersCtx *ctx)
+{
+ ctx->seg.dir = psprintf("%s/pg_multixact/members", new_cluster.pgdata);
+
+ ctx->group = 0;
+ ctx->member = 1; /* skip invalid zero offset */
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+
+ ctx->flag += ctx->member;
+ ctx->xid += ctx->member;
+}
+
+static void
+MultiXactMembersCtxAdd(MultiXactMembersCtx *ctx, char flag, TransactionId xid)
+{
+ /* Copy member's xid and flags to the new page */
+ *ctx->flag++ = flag;
+ *ctx->xid++ = xid;
+
+ if (++ctx->member < MULTIXACT_MEMBERS_PER_GROUP)
+ return;
+
+ /* Start next member group */
+ ctx->member = 0;
+
+ if (++ctx->group >= MULTIXACT_MEMBERGROUPS_PER_PAGE)
+ {
+ /* Write current page and start new */
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ ctx->group = 0;
+ memset(ctx->buf, 0, BLCKSZ);
+ }
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+}
+
+static void
+MultiXactMembersCtxFinit(MultiXactMembersCtx *ctx)
+{
+ if (ctx->flag > (char *) ctx->buf)
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ close_segment(&ctx->seg);
+
+ pfree(ctx->seg.dir);
+}
+
+/*
+ * Convert pg_multixact/members segments, offsets will start from 1.
+ */
+void
+convert_multixact_members(MultiXactOffset oldest_offset)
+{
+ MultiXactOffset next_offset;
+ MultiXactOffset offset;
+ SlruSegState oldseg = {0};
+ char oldbuf[BLCKSZ] = {0};
+ int oldidx;
+ MultiXactMembersCtx newctx = {0};
+
+ oldseg.dir = psprintf("%s/pg_multixact/members", old_cluster.pgdata);
+
+ next_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ /* Initialize the old starting position */
+ oldseg.pageno = oldest_offset / MULTIXACT_MEMBERS_PER_PAGE;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ /* Initialize new starting position */
+ MultiXactMembersCtxInit(&newctx);
+
+ /* Iterate through the original directory */
+ oldidx = oldest_offset % MULTIXACT_MEMBERS_PER_PAGE;
+ for (offset = oldest_offset; offset < next_offset;)
+ {
+ bool empty;
+ int oldlen;
+ int ngroups;
+ int oldgroup;
+ int oldmember;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Iterate through the old member groups */
+ ngroups = oldlen / MULTIXACT_MEMBERGROUP_SIZE;
+ oldmember = oldidx % MULTIXACT_MEMBERS_PER_GROUP;
+ oldgroup = oldidx / MULTIXACT_MEMBERS_PER_GROUP;
+ while (oldgroup < ngroups && offset < next_offset)
+ {
+ char *oldflag;
+ TransactionId *oldxid;
+ int i;
+
+ oldflag = (char *) oldbuf + oldgroup * MULTIXACT_MEMBERGROUP_SIZE;
+ oldxid = (TransactionId *)(oldflag + MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP);
+
+ oldxid += oldmember;
+ oldflag += oldmember;
+
+ /* Iterate through the old members */
+ for (i = oldmember;
+ i < MULTIXACT_MEMBERS_PER_GROUP && offset < next_offset;
+ i++)
+ {
+ MultiXactMembersCtxAdd(&newctx, *oldflag++, *oldxid++);
+
+ if (++offset == (uint64) 1 << 32)
+ {
+ Assert(i == MaxMultiXactOffsetOld % MULTIXACT_MEMBERS_PER_GROUP);
+ goto wraparound;
+ }
+ }
+
+ oldgroup++;
+ oldmember = 0;
+ }
+
+ oldidx = 0;
+
+ continue;
+
+wraparound:
+#define SEGNO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE / SLRU_PAGES_PER_SEGMENT
+#define PAGENO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE % SLRU_PAGES_PER_SEGMENT
+ Assert((oldseg.segno == SEGNO_MAX && oldseg.pageno == PAGENO_MAX + 1) ||
+ (oldseg.segno == SEGNO_MAX + 1 && oldseg.pageno == 0));
+
+ /* Switch to segment 0000 */
+ close_segment(&oldseg);
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+
+ /* skip invalid zero multi offset */
+ oldidx = 1;
+ }
+
+ MultiXactMembersCtxFinit(&newctx);
+
+ /* Release resources */
+ close_segment(&oldseg);
+
+ pfree(oldseg.dir);
+}
--
2.43.0
[application/octet-stream] v6-0004-TEST-lower-SLRU_PAGES_PER_SEGMENT-set-bump-catver.patch (2.1K, 5-v6-0004-TEST-lower-SLRU_PAGES_PER_SEGMENT-set-bump-catver.patch)
download | inline diff:
From 970940711a6a4eab4e30f05412dba90fe2570433 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 29 Oct 2024 18:28:40 +0300
Subject: [PATCH v6 4/6] TEST: lower SLRU_PAGES_PER_SEGMENT + set bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/bin/pg_upgrade/segresize.c | 2 +-
src/include/access/slru.h | 2 +-
src/include/catalog/catversion.h | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..01252a7ed5 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411082
/*
* large object chunk size added to pg_controldata,
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
index ff7ff65758..0547b51741 100644
--- a/src/bin/pg_upgrade/segresize.c
+++ b/src/bin/pg_upgrade/segresize.c
@@ -13,7 +13,7 @@
#include "access/multixact.h"
/* See slru.h */
-#define SLRU_PAGES_PER_SEGMENT 32
+#define SLRU_PAGES_PER_SEGMENT 2
/*
* Some kind of iterator associated with a particular SLRU segment. The idea is
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 97e612cd10..74dd54819d 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -36,7 +36,7 @@
* take no explicit notice of that fact in slru.c, except when comparing
* segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
*/
-#define SLRU_PAGES_PER_SEGMENT 32
+#define SLRU_PAGES_PER_SEGMENT 2
/*
* Page status codes. Note that these do not include the "dirty" bit.
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 86436e0356..05048a512b 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411081
+#define CATALOG_VERSION_NO 202411082
#endif
--
2.43.0
[application/octet-stream] v6-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch (25.4K, 6-v6-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch)
download | inline diff:
From 6e959f89e37614b94d3c4dd5695355095e8c38fd Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v6 5/6] TEST: initdb option to initialize cluster with
non-standard xid/mxid/mxoff
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 ++++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++--
src/backend/bootstrap/bootstrap.c | 50 +++++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 +++-
src/backend/tcop/postgres.c | 53 +++++++++++-
src/bin/initdb/initdb.c | 107 ++++++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++++
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
13 files changed, 382 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a817f539ee..095c39dd93 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -1978,7 +1999,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ed59dfce89..05ce03a3a3 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -216,7 +216,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -271,12 +271,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 8bee1fb664..af4b004e04 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -562,7 +562,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -659,10 +659,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -713,6 +721,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index aac0b96bbc..1f0e27b9bf 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..410868dddf 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index e1b3187d0b..f770e9a140 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
--
2.43.0
[application/octet-stream] v6-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (8.8K, 7-v6-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From d5f1e8880a5f072c389274954b21f982797af47e Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 23 Oct 2024 18:23:39 +0300
Subject: [PATCH v6 3/6] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Instead, switch to MULTIXACT_MEMBER_AUTOVAC_THRESHOLD (eq 2^32) members
threshold. It is used to determine if we need to force autovacuum or not.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 117 +++----------------------
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/include/access/multixact.h | 1 -
4 files changed, 15 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 48e1c0160a..a817f539ee 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -204,10 +204,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuumin order to release the disk space, reduce table bloat if
+ * possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -2616,15 +2620,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2712,10 +2714,10 @@ SetOffsetVacuumLimit(bool is_startup)
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2761,101 +2763,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 86f36b3695..e7506e268a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,7 +1133,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..180bb7e96e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1915,7 +1915,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v6-0006-TEST-add-basic-mxidoff64-tests-005_mxidoff.pl.patch (10.6K, 8-v6-0006-TEST-add-basic-mxidoff64-tests-005_mxidoff.pl.patch)
download | inline diff:
From 386cfe747bc4ccd867f3e27f5f7669c8eb7692f3 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Sat, 2 Nov 2024 10:46:16 +0300
Subject: [PATCH v6 6/6] TEST: add basic mxidoff64 tests 005_mxidoff.pl
---
src/bin/pg_upgrade/t/005_mxidoff.pl | 389 ++++++++++++++++++++++++++++
1 file changed, 389 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/005_mxidoff.pl
diff --git a/src/bin/pg_upgrade/t/005_mxidoff.pl b/src/bin/pg_upgrade/t/005_mxidoff.pl
new file mode 100644
index 0000000000..e595870543
--- /dev/null
+++ b/src/bin/pg_upgrade/t/005_mxidoff.pl
@@ -0,0 +1,389 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if (!defined($ENV{oldinstall}))
+{
+ die "oldinstall is not defined";
+}
+
+sub mxid_prepare
+{
+ my ($node) = @_;
+
+ $node->safe_psql('postgres',
+ q(
+ CREATE TABLE FOO(BAR INT PRIMARY KEY, BAZ INT);
+ CREATE OR REPLACE PROCEDURE MXIDFILLER(N_STEPS INT DEFAULT 1000)
+ LANGUAGE PLPGSQL
+ AS $$
+ BEGIN
+ FOR I IN 1..N_STEPS LOOP
+ UPDATE FOO SET BAZ = RANDOM(1, 1000)
+ WHERE BAR IN (SELECT BAR FROM FOO TABLESAMPLE BERNOULLI(80));
+ COMMIT;
+ END LOOP;
+ END;$$;
+ INSERT INTO FOO (BAR, BAZ) SELECT ID, ID FROM GENERATE_SERIES(1, 512) ID;
+ ));
+}
+
+sub mxid_fill
+{
+ my ($node) = @_;
+
+ $node->safe_psql('postgres',
+ q(
+ BEGIN;
+ SELECT * FROM FOO FOR KEY SHARE;
+ PREPARE TRANSACTION 'A';
+ CALL MXIDFILLER(365);
+ COMMIT PREPARED 'A';
+ ),
+ timeout => 3600);
+}
+
+# Fetch latest multixact checkpoint values.
+sub multi_bounds
+{
+ my ($node) = @_;
+ my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+ my @control_data = split("\n", $stdout);
+ my $next = undef;
+ my $oldest = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiXactId:\s*(.*)$/mg)
+ {
+ $next = $1;
+ }
+
+ if ($_ =~ /^Latest checkpoint's oldestMultiXid:\s*(.*)$/mg)
+ {
+ $oldest = $1;
+ }
+
+ if (defined($oldest) && defined($next))
+ {
+ last;
+ }
+ }
+
+ die "Latest checkpoint's NextMultiXactId not found in control file!\n"
+ unless defined($next);
+
+ die "Latest checkpoint's oldestMultiXid not found in control file!\n"
+ unless defined($oldest);
+
+ return ($oldest, $next);
+}
+
+# List pg_multixact/offsets segments filenames.
+sub list_actual_multixact_offsets
+{
+ my ($node) = @_;
+ my $dir;
+
+ opendir($dir, $node->data_dir . '/pg_multixact/offsets') or die $!;
+ my @list = sort grep { /[0-9A-F]+/ } readdir $dir;
+ closedir $dir;
+
+ return @list;
+}
+
+use constant SIZEOF_MULTI_XACT_OFFSET => 8;
+use constant BLCKSZ => 8192;
+use constant MULTIXACT_OFFSETS_PER_PAGE => BLCKSZ / SIZEOF_MULTI_XACT_OFFSET;
+use constant SLRU_PAGES_PER_SEGMENT => 2;
+
+# See src/backend/access/transam/multixact.c
+sub MultiXactIdToOffsetSegment
+{
+ my ($multi) = @_;
+
+ return $multi / MULTIXACT_OFFSETS_PER_PAGE / SLRU_PAGES_PER_SEGMENT;
+}
+
+# Validate pg_multixact/offsets segments conversion.
+sub validate_multixact_offsets
+{
+ my ($old, $new, $oldnode) = @_;
+ my ($oldest, $next) = multi_bounds($oldnode);
+ my $maxsegno = MultiXactIdToOffsetSegment($next);
+ my $maxsegname = sprintf("%04X", $maxsegno);
+
+ print(">>>>>>>>>\n");
+ foreach my $segname ( @$old )
+ {
+ my $segno = hex($segname) * 2;
+ my $converted1 = sprintf("%04X", $segno);
+ my $converted2 = sprintf("%04X", $segno + 1);
+
+ print "[${segname}] -> [${converted1}, ${converted2}] \n";
+ # Skip the last segment as it may be incomplete.
+ if (not $converted1 eq $maxsegname)
+ {
+ die "Segmanet ${segname} is not properly converted"
+ unless (not $converted1 eq $maxsegname) and
+ grep { $converted1 eq $_ } @$new and
+ grep { $converted2 eq $_ } @$new;
+ }
+ }
+ print(">>>>>>>>>\n");
+
+ return 1;
+}
+
+#
+# Select tests to run.
+#
+my @tests = (0, 1, 2, 3);
+
+# =============================================================================
+# CASE 0
+#
+# There must be several segments starting from the zero.
+# =============================================================================
+SKIP:
+{
+ skip "case 0", 0
+ unless ( grep( /^0$/, @tests ) );
+
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node0',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1);
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node0');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case0: offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT 1));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT 1));
+ is($oldval, $newval, "case1: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+# =============================================================================
+# CASE 1
+#
+# There must be several segments starting from the zero.
+# =============================================================================
+SKIP:
+{
+ skip "case 1", 1
+ unless ( grep( /^1$/, @tests ) );
+
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node1',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1);
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+ $oldnode->start();
+
+ mxid_prepare($oldnode);
+ mxid_fill($oldnode);
+
+ $oldnode->safe_psql('postgres', q(CHECKPOINT));
+ $oldnode->stop();
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node1');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case1: offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldval, $newval, "case1: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+# =============================================================================
+# CASE 2
+#
+# Non-standard oldestMultiXid and NextMultiXactId.
+# There must be several segments starting from some value.
+# =============================================================================
+SKIP:
+{
+ skip "case 2", 2
+ unless ( grep( /^2$/, @tests ) );
+
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node2',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1,
+ extra => [
+ '-m', '0x123000', '-o', '0x123000'
+ ]);
+
+ # Fixup MOX patch quirk
+ unlink $oldnode->data_dir . '/pg_multixact/members/0000';
+ unlink $oldnode->data_dir . '/pg_multixact/offsets/0000';
+
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+ $oldnode->start();
+
+ mxid_prepare($oldnode);
+ mxid_fill($oldnode);
+
+ $oldnode->safe_psql('postgres', q(CHECKPOINT));
+ $oldnode->stop();
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node2');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case2: non-standard offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldval, $newval, "case2: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+# =============================================================================
+# CASE 3
+#
+# Non-standard oldestMultiXid and NextMultiXactId.
+# =============================================================================
+SKIP:
+{
+ skip "case 3", 3
+ unless ( grep( /^3$/, @tests ) );
+ chdir ${PostgreSQL::Test::Utils::tmp_check};
+ my $oldnode = PostgreSQL::Test::Cluster->new('old_node3',
+ install_path => $ENV{oldinstall});
+ $oldnode->init(force_initdb => 1,
+ extra => [
+ '-m', '0xFFFF0000', '-o', '0xFFFF0000'
+ ]);
+
+ # Fixup MOX patch quirk
+ unlink $oldnode->data_dir . '/pg_multixact/members/0000';
+ unlink $oldnode->data_dir . '/pg_multixact/offsets/0000';
+
+ $oldnode->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+ $oldnode->append_conf('fsync', 'off');
+ $oldnode->start();
+
+ mxid_prepare($oldnode);
+ mxid_fill($oldnode);
+ mxid_fill($oldnode);
+
+ $oldnode->safe_psql('postgres', q(CHECKPOINT));
+ $oldnode->stop();
+
+ my $newnode = PostgreSQL::Test::Cluster->new('new_node3');
+ $newnode->init();
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ my @o = list_actual_multixact_offsets($oldnode);
+ my @n = list_actual_multixact_offsets($newnode);
+ ok(validate_multixact_offsets(\@o, \@n, $oldnode),
+ "case3: multi warp, non-standard offsets segmants matched");
+
+ $oldnode->start();
+ $newnode->start();
+
+ # just in case...
+ my $oldval = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newval = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldval, $newval, "case3: select eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+}
+
+done_testing();
--
2.43.0
[application/octet-stream] 0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch (784B, 9-0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch)
download | inline diff:
From 57f96bdfe7b78794e7abe8802550e4a31e6c9370 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Fri, 8 Nov 2024 20:56:27 +0300
Subject: [PATCH 2/2] TEST: lower SLRU_PAGES_PER_SEGMENT
---
src/include/access/slru.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 97e612cd10..74dd54819d 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -36,7 +36,7 @@
* take no explicit notice of that fact in slru.c, except when comparing
* segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
*/
-#define SLRU_PAGES_PER_SEGMENT 32
+#define SLRU_PAGES_PER_SEGMENT 2
/*
* Page status codes. Note that these do not include the "dirty" bit.
--
2.43.0
[application/octet-stream] 0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch (33.0K, 10-0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch)
download | inline diff:
From 34623803146a152796b611421dd9684e4fefa785 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH 1/2] Add initdb option to initialize cluster with non-standard
xid/mxid/mxoff.
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 +++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++-
src/backend/bootstrap/bootstrap.c | 50 +++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 ++-
src/backend/tcop/postgres.c | 53 ++++++++++-
src/bin/initdb/initdb.c | 107 +++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++
src/bin/pg_amcheck/t/004_verify_heapam.pl | 35 +++----
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +-
src/test/regress/pg_regress.c | 3 +-
src/test/xid-64/t/001_test_large_xids.pl | 54 +++++++++++
17 files changed, 460 insertions(+), 32 deletions(-)
create mode 100644 src/test/xid-64/t/001_test_large_xids.pl
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..017eff07bd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2035,6 +2035,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -2046,6 +2047,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -2058,7 +2079,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ed59dfce89..38165eb796 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -216,7 +216,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -271,12 +271,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 8bee1fb664..af4b004e04 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -562,7 +562,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -659,10 +659,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -713,6 +721,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index aac0b96bbc..4636d99b2f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..1cc54392e5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtoull(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index 95fe6e6d3b..93eefd0479 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -320,6 +320,8 @@ my $relfrozenxid = $node->safe_psql('postgres',
q(select relfrozenxid from pg_class where relname = 'test'));
my $datfrozenxid = $node->safe_psql('postgres',
q(select datfrozenxid from pg_database where datname = 'postgres'));
+my $datminmxid = $node->safe_psql('postgres',
+ q(select datminmxid from pg_database where datname = 'postgres'));
# Sanity check that our 'test' table has a relfrozenxid newer than the
# datfrozenxid for the database, and that the datfrozenxid is greater than the
@@ -454,40 +456,39 @@ for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
# Expected corruption report
push @expected,
- qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ qr/${header}xmin $xmin precedes relation freeze threshold \d+/;
}
elsif ($offnum == 2)
{
# Corruptly set xmin < datfrozenxid
- my $xmin = 3;
+ my $xmin = $datfrozenxid - 12;
$tup->{t_xmin} = $xmin;
$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
push @expected,
- qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID \d+/;
}
elsif ($offnum == 3)
{
- # Corruptly set xmin < datfrozenxid, further back, noting circularity
- # of xid comparison.
- my $xmin = 4026531839;
+ # Corruptly set xmin > next transaction id.
+ my $xmin = $relfrozenxid + 1000000;
$tup->{t_xmin} = $xmin;
$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
push @expected,
- qr/${$header}xmin ${xmin} precedes oldest valid transaction ID 0:\d+/;
+ qr/${$header}xmin $xmin equals or exceeds next valid transaction ID \d+/;
}
elsif ($offnum == 4)
{
- # Corruptly set xmax < relminmxid;
- my $xmax = 4026531839;
+ # Corruptly set xmax > next transaction id.
+ my $xmax = $relfrozenxid + 1000000;
$tup->{t_xmax} = $xmax;
$tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
push @expected,
- qr/${$header}xmax ${xmax} precedes oldest valid transaction ID 0:\d+/;
+ qr/${$header}xmax $xmax equals or exceeds next valid transaction ID \d+/;
}
elsif ($offnum == 5)
{
@@ -590,31 +591,33 @@ for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
# Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
$tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
$tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
- $tup->{t_xmax} = 4;
+ my $xmax = $datminmxid + 1000000;
+ $tup->{t_xmax} = $xmax;
push @expected,
- qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ qr/${header}multitransaction ID $xmax equals or exceeds next valid multitransaction ID \d+/;
}
elsif ($offnum == 15)
{
# Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
$tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
$tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
- $tup->{t_xmax} = 4000000000;
+ my $xmax = $datminmxid - 10;
+ $tup->{t_xmax} = $xmax;
push @expected,
- qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ qr/${header}multitransaction ID $xmax precedes relation minimum multitransaction ID threshold \d+/;
}
elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
{
# Corruptly set xmin > next_xid to be in the future.
- my $xmin = 123456;
+ my $xmin = $relfrozenxid + 1000000;
$tup->{t_xmin} = $xmin;
$tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
push @expected,
- qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/;
+ qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID \d+/;
}
elsif ($offnum == 17)
{
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..218afeeb3b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint32 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index e5526c7565..79df6faeb9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -643,7 +643,9 @@ sub init
{
note("initializing database system by running initdb");
PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
- 'trust', '-N', @{ $params{extra} });
+ 'trust', '-N',
+ '-x', '124983', '-m', '242236', '-o', '359488',
+ @{ $params{extra} });
}
else
{
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 0e40ed32a2..3511c4b500 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2333,7 +2333,8 @@ regression_main(int argc, char *argv[],
note("initializing database system by running initdb");
appendStringInfo(&cmd,
- "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
+ "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync"
+ " -x 124983 -m 242236 -o 359488",
bindir ? bindir : "",
bindir ? "/" : "",
temp_instance);
diff --git a/src/test/xid-64/t/001_test_large_xids.pl b/src/test/xid-64/t/001_test_large_xids.pl
new file mode 100644
index 0000000000..4c7dbc6cb1
--- /dev/null
+++ b/src/test/xid-64/t/001_test_large_xids.pl
@@ -0,0 +1,54 @@
+# Tests for large xid values
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use bigint;
+
+sub command_output
+{
+ my ($cmd) = @_;
+ my ($stdout, $stderr);
+ print("# Running: " . join(" ", @{$cmd}) . "\n");
+ my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ ok($result, "@$cmd exit code 0");
+ is($stderr, '', "@$cmd no stderr");
+ return $stdout;
+}
+
+my $START_VAL = 2**32;
+my $MAX_VAL = 2**62;
+
+my $ixid = $START_VAL + int(rand($MAX_VAL - $START_VAL));
+my $imxid = $START_VAL + int(rand($MAX_VAL - $START_VAL));
+my $imoff = $START_VAL + int(rand($MAX_VAL - $START_VAL));
+
+# Initialize master node with the random xid-related parameters
+my $node = PostgreSQL::Test::Cluster->new('master');
+$node->init(extra => [ "--xid=$ixid", "--multixact-id=$imxid", "--multixact-offset=$imoff" ]);
+$node->start;
+
+# Initialize master node and check the xid-related parameters
+my $pgcd_output = command_output(
+ [ 'pg_controldata', '-D', $node->data_dir ] );
+print($pgcd_output); print('\n');
+ok($pgcd_output =~ qr/Latest checkpoint's NextXID:\s*(\d+)/, "XID found");
+my ($nextxid) = ($1);
+ok($nextxid >= $ixid && $nextxid < $ixid + 1000,
+ "Latest checkpoint's NextXID ($nextxid) is close to the initial xid ($ixid).");
+ok($pgcd_output =~ qr/Latest checkpoint's NextMultiXactId:\s*(\d+)/, "MultiXactId found");
+my ($nextmxid) = ($1);
+ok($nextmxid >= $imxid && $nextmxid < $imxid + 1000,
+ "Latest checkpoint's NextMultiXactId ($nextmxid) is close to the initial multiXactId ($imxid).");
+ok($pgcd_output =~ qr/Latest checkpoint's NextMultiOffset:\s*(\d+)/, "MultiOffset found");
+my ($nextmoff) = ($1);
+ok($nextmoff >= $imoff && $nextmoff < $imoff + 1000,
+ "Latest checkpoint's NextMultiOffset ($nextmoff) is close to the initial multiOffset ($imoff).");
+
+# Run pgbench to check whether the database is working properly
+$node->command_ok(
+ [ qw(pgbench --initialize --no-vacuum --scale=10) ],
+ 'pgbench finished without errors');
+
+done_testing();
\ No newline at end of file
--
2.43.0
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-11-11 23:31 ` Heikki Linnakangas <[email protected]>
2024-11-13 15:44 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Heikki Linnakangas @ 2024-11-11 23:31 UTC (permalink / raw)
To: Maxim Orlov <[email protected]>; wenhui qiu <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
On 08/11/2024 20:10, Maxim Orlov wrote:
> Sorry for a late reply. There was a problem in upgrade with offset
> wraparound. Here is a fixed version. Test also added. I decide to use my
> old patch to set a non-standard multixacts for the old cluster, fill it
> with data and do pg_upgrade.
The wraparound logic is still not correct. To test, I created a cluster
where multixids have wrapped around, so that:
$ ls -l data-old/pg_multixact/offsets/
total 720
-rw------- 1 heikki heikki 212992 Nov 12 01:11 0000
-rw-r--r-- 1 heikki heikki 262144 Nov 12 00:55 FFFE
-rw------- 1 heikki heikki 262144 Nov 12 00:56 FFFF
After running pg_upgrade:
$ ls -l data-new/pg_multixact/offsets/
total 1184
-rw------- 1 heikki heikki 155648 Nov 12 01:12 0001
-rw------- 1 heikki heikki 262144 Nov 12 01:11 1FFFD
-rw------- 1 heikki heikki 262144 Nov 12 01:11 1FFFE
-rw------- 1 heikki heikki 262144 Nov 12 01:11 1FFFF
-rw------- 1 heikki heikki 262144 Nov 12 01:11 20000
-rw------- 1 heikki heikki 155648 Nov 12 01:11 20001
That's not right. The segments 20000 and 20001 were created by the new
pg_upgrade conversion code from old segment '0000'. But multixids are
still 32-bit values, so after segment 1FFFF, you should still wrap
around to 0000. The new segments should be '0000' and '0001'. The
segment '0001' is created when postgres is started after upgrade, but
it's created from scratch and doesn't contain the upgraded values.
When I try to select from a table after upgrade that contains
post-wraparound multixids:
TRAP: failed Assert("offset != 0"), File:
"../src/backend/access/transam/multixact.c", Line: 1353, PID: 63386
On a different note, I'm surprised you're rewriting member segments from
scratch, parsing all the individual member groups and writing them out
again. There's no change to the members file format, except for the
numbering of the files, so you could just copy the files under the new
names without paying attention to the contents. It's not wrong to parse
them in detail, but I'd assume that it would be simpler not to.
> Here is how to test. All the patches are for 14e87ffa5c543b5f3 master
> branch.
> 1) Get the 14e87ffa5c543b5f3 master branch apply patches 0001-Add-
> initdb-option-to-initialize-cluster-with-non-sta.patch and 0002-TEST-
> lower-SLRU_PAGES_PER_SEGMENT.patch
> 2) Get the 14e87ffa5c543b5f3 master branch in a separate directory and
> apply v6 patch set.
> 3) Build two branches.
> 4) Use ENV oldinstall to run the test: PROVE_TESTS=t/005_mxidoff.pl
> <http://005_mxidoff.pl; oldinstall=/home/orlov/proj/pgsql-new
> PG_TEST_NOCLEAN=1 make check -C src/bin/pg_upgrade/
>
> Maybe, I'll make a shell script to automate this steps if required.
Yeah, I think we need something to automate this. I did the testing
manually. I used the attached python script to consume multixids faster,
but it's still tedious.
I used pg_resetwal to quickly create a cluster that's close to multixid
wrapround:
initdb -D data
pg_resetwal -D data -m 4294900001,4294900000
dd if=/dev/zero of=data/pg_multixact/offsets/FFFE bs=8192 count=32
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
[text/x-python] multixids.py (1.8K, 2-multixids.py)
download | inline:
import sys;
import threading;
import psycopg2;
def test_multixact(tblname: str):
with psycopg2.connect() as conn:
cur = conn.cursor()
cur.execute(
f"""
DROP TABLE IF EXISTS {tblname};
CREATE TABLE {tblname}(i int primary key, n_updated int) WITH (autovacuum_enabled=false);
INSERT INTO {tblname} select g, 0 from generate_series(1, 50) g;
"""
)
# Lock entries using parallel connections in a round-robin fashion.
nclients = 50
update_every = 97
connections = []
for _ in range(nclients):
# Do not turn on autocommit. We want to hold the key-share locks.
conn = psycopg2.connect()
connections.append(conn)
# On each iteration, we commit the previous transaction on a connection,
# and issue another select. Each SELECT generates a new multixact that
# includes the new XID, and the XIDs of all the other parallel transactions.
# This generates enough traffic on both multixact offsets and members SLRUs
# to cross page boundaries.
for i in range(20000):
conn = connections[i % nclients]
conn.commit()
# Perform some non-key UPDATEs too, to exercise different multixact
# member statuses.
if i % update_every == 0:
conn.cursor().execute(f"update {tblname} set n_updated = n_updated + 1 where i = {i % 50}")
else:
conn.cursor().execute(f"select * from {tblname} for key share")
#nthreads=10
#
#threads = []
#for threadno in range(nthreads):
# tblname = f"tbl{threadno}"
# t = threading.Thread(target=test_multixact, args=(tblname,))
# t.start()
# threads.append(t)
#
#for threadno in range(nthreads):
# threads[threadno].join()
test_multixact(sys.argv[1])
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
@ 2024-11-13 15:44 ` Maxim Orlov <[email protected]>
2024-11-15 08:41 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-15 11:06 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
0 siblings, 2 replies; 21+ messages in thread
From: Maxim Orlov @ 2024-11-13 15:44 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: wenhui qiu <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
On Tue, 12 Nov 2024 at 02:31, Heikki Linnakangas <[email protected]> wrote:
> The wraparound logic is still not correct.
Yep, my fault. I forget to reset segment counter if wraparound is happened.
Fixed.
When I try to select from a table after upgrade that contains
> post-wraparound multixids:
>
> TRAP: failed Assert("offset != 0"), File:
> "../src/backend/access/transam/multixact.c", Line: 1353, PID: 63386
>
The problem was in converting offset segments. The new_entry index should
also bypass the invalid offset (0) value. Fixed.
>
> On a different note, I'm surprised you're rewriting member segments from
> scratch, parsing all the individual member groups and writing them out
> again. There's no change to the members file format, except for the
> numbering of the files, so you could just copy the files under the new
> names without paying attention to the contents. It's not wrong to parse
> them in detail, but I'd assume that it would be simpler not to.
>
Yes, at the beginning I also thought that it would be possible to get by
with simple copying. But in case of wraparound, we must "bypass" invalid
zero offset value. See, old 32 bit offsets a wrapped at 2^32, thus 0 values
appears in multixact.c So, they must be handled. Bypass, in fact. When we
are switched to the 64-bit offsets, we have two options:
1). Bypass every ((uint32) offset == 0) value in multixact.c;
2). Convert members and bypass invalid value once.
The first options seem too weird for me. So, we have to repack members and
bypass invalid value.
All patches are for master@38c18710b37a2d
--
Best regards,
Maxim Orlov.
Attachments:
[application/octet-stream] v7-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 3-v7-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From fdb7e2eee33dfb5df714d8d16112d4c907475d78 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v7 1/5] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 05c738d661..727b6e744f 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -876,8 +876,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.43.0
[application/octet-stream] v7-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (8.8K, 4-v7-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From 1b630d2f82ce69cd8479aaaec7dfe266a77fb718 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 23 Oct 2024 18:23:39 +0300
Subject: [PATCH v7 4/5] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Instead, switch to MULTIXACT_MEMBER_AUTOVAC_THRESHOLD (eq 2^32) members
threshold. It is used to determine if we need to force autovacuum or not.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 117 +++----------------------
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/include/access/multixact.h | 1 -
4 files changed, 15 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 48e1c0160a..a817f539ee 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -204,10 +204,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuumin order to release the disk space, reduce table bloat if
+ * possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -2616,15 +2620,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2712,10 +2714,10 @@ SetOffsetVacuumLimit(bool is_startup)
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2761,101 +2763,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 86f36b3695..e7506e268a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,7 +1133,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..180bb7e96e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1915,7 +1915,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v7-0005-TEST-bump-catver.patch (1.1K, 5-v7-0005-TEST-bump-catver.patch)
download | inline diff:
From f229177951e7c233e4f827cd1996f9ae9eac8f88 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 13 Nov 2024 16:34:34 +0300
Subject: [PATCH v7 5/5] TEST: bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/include/catalog/catversion.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..18faedc963 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411112
/*
* large object chunk size added to pg_controldata,
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5dd91e190a..3d09caf5ae 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411111
+#define CATALOG_VERSION_NO 202411112
#endif
--
2.43.0
[application/octet-stream] v7-0003-Make-pg_upgrade-convert-multixact-offsets.patch (18.6K, 6-v7-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From 3f26d4d4d8aeb61729da3faf7506c7df2aa4347d Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v7 3/5] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
Author: Yura Sokolov <[email protected]>
---
src/backend/access/transam/multixact.c | 2 +-
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 42 +-
src/bin/pg_upgrade/pg_upgrade.h | 14 +-
src/bin/pg_upgrade/segresize.c | 529 +++++++++++++++++++++++++
6 files changed, 583 insertions(+), 6 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c51e03e832..48e1c0160a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1891,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- true);
+ false);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..1654e877c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,8 +750,42 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
- copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ MultiXactOffset oldest_offset,
+ next_offset;
+
+ remove_new_subdir("pg_multixact/offsets", false);
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+ oldest_offset = convert_multixact_offsets();
+ check_ok();
+
+ remove_new_subdir("pg_multixact/members", false);
+ prep_status("Converting pg_multixact/members");
+ convert_multixact_members(oldest_offset);
+ check_ok();
+
+ next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+ if (oldest_offset)
+ {
+ if (next_offset < oldest_offset)
+ next_offset += ((MultiXactOffset) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ {
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ }
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +794,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..2c85ec1e94 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,8 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+MultiXactOffset convert_multixact_offsets(void);
+void convert_multixact_members(MultiXactOffset oldest_offset);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..1f02bb8aea
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,529 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+} SlruSegState;
+
+/*
+ * Get SLRU segment file name from state.
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFF));
+
+ return psprintf("%s/%04X", state->dir, (unsigned int) (state->segno));
+}
+
+/*
+ * Create SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+}
+
+typedef uint32 MultiXactOffsetOld;
+
+#define MaxMultiXactOffsetOld ((MultiXactOffsetOld) 0xFFFFFFFF)
+
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(MultiXactOffsetOld))
+#define MULTIXACT_OFFSETS_PER_PAGE_NEW (BLCKSZ / sizeof(MultiXactOffset))
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+MultiXactOffset
+convert_multixact_offsets(void)
+{
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ MultiXactOffsetOld oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE_NEW] = {0},
+ oldest_offset = 0;
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool oldest_offset_known = false;
+
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ /* Copy multi offsets reading only needed segment pages */
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound */
+#define OLD_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_OLD / SLRU_PAGES_PER_SEGMENT)
+ if (oldseg.segno > OLD_OFFSET_SEGNO_MAX)
+ {
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+ }
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Save oldest multi offset */
+ if (!oldest_offset_known)
+ {
+ oldest_offset = oldbuf[old_entry];
+ oldest_offset_known = true;
+ }
+
+ /* Skip wrapped-around invalid MultiXactIds */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+ Assert(new_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ new_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound (1 becomes 2^32) */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1 */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE_NEW)
+ {
+ /* Handle possible segment wraparound */
+#define NEW_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_NEW / SLRU_PAGES_PER_SEGMENT)
+ if (newseg.segno > NEW_OFFSET_SEGNO_MAX)
+ {
+ newseg.segno = 0;
+ newseg.pageno = 0;
+ }
+
+ /* Write new page */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE_NEW - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Use next_offset as oldest_offset, if oldest_multi == next_multi */
+ if (!oldest_offset_known)
+ {
+ Assert(oldest_multi == next_multi);
+ oldest_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ }
+
+ /* Release resources */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ return oldest_offset;
+}
+
+#define MXACT_MEMBERS_FLAG_BYTES 1
+
+#define MULTIXACT_MEMBERS_PER_GROUP 4
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (MULTIXACT_MEMBERS_PER_GROUP * (sizeof(TransactionId) + MXACT_MEMBERS_FLAG_BYTES))
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE \
+ (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERS_PER_GROUP * MULTIXACT_MEMBERGROUPS_PER_PAGE)
+#define MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP \
+ (MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP)
+
+typedef struct MultiXactMembersCtx
+{
+ SlruSegState seg;
+ char buf[BLCKSZ];
+ int group;
+ int member;
+ char *flag;
+ TransactionId *xid;
+} MultiXactMembersCtx;
+
+static void
+MultiXactMembersCtxInit(MultiXactMembersCtx *ctx)
+{
+ ctx->seg.dir = psprintf("%s/pg_multixact/members", new_cluster.pgdata);
+
+ ctx->group = 0;
+ ctx->member = 1; /* skip invalid zero offset */
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+
+ ctx->flag += ctx->member;
+ ctx->xid += ctx->member;
+}
+
+static void
+MultiXactMembersCtxAdd(MultiXactMembersCtx *ctx, char flag, TransactionId xid)
+{
+ /* Copy member's xid and flags to the new page */
+ *ctx->flag++ = flag;
+ *ctx->xid++ = xid;
+
+ if (++ctx->member < MULTIXACT_MEMBERS_PER_GROUP)
+ return;
+
+ /* Start next member group */
+ ctx->member = 0;
+
+ if (++ctx->group >= MULTIXACT_MEMBERGROUPS_PER_PAGE)
+ {
+ /* Write current page and start new */
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ ctx->group = 0;
+ memset(ctx->buf, 0, BLCKSZ);
+ }
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+}
+
+static void
+MultiXactMembersCtxFinit(MultiXactMembersCtx *ctx)
+{
+ if (ctx->flag > (char *) ctx->buf)
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ close_segment(&ctx->seg);
+
+ pfree(ctx->seg.dir);
+}
+
+/*
+ * Convert pg_multixact/members segments, offsets will start from 1.
+ *
+ */
+void
+convert_multixact_members(MultiXactOffset oldest_offset)
+{
+ MultiXactOffset next_offset,
+ offset;
+ SlruSegState oldseg = {0};
+ char oldbuf[BLCKSZ] = {0};
+ int oldidx;
+ MultiXactMembersCtx newctx = {0};
+
+ oldseg.dir = psprintf("%s/pg_multixact/members", old_cluster.pgdata);
+
+ next_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ /* Initialize the old starting position */
+ oldseg.pageno = oldest_offset / MULTIXACT_MEMBERS_PER_PAGE;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ /* Initialize new starting position */
+ MultiXactMembersCtxInit(&newctx);
+
+ /* Iterate through the original directory */
+ oldidx = oldest_offset % MULTIXACT_MEMBERS_PER_PAGE;
+ for (offset = oldest_offset; offset < next_offset;)
+ {
+ bool empty;
+ int oldlen;
+ int ngroups;
+ int oldgroup;
+ int oldmember;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Iterate through the old member groups */
+ ngroups = oldlen / MULTIXACT_MEMBERGROUP_SIZE;
+ oldmember = oldidx % MULTIXACT_MEMBERS_PER_GROUP;
+ oldgroup = oldidx / MULTIXACT_MEMBERS_PER_GROUP;
+ while (oldgroup < ngroups && offset < next_offset)
+ {
+ char *oldflag;
+ TransactionId *oldxid;
+ int i;
+
+ oldflag = (char *) oldbuf + oldgroup * MULTIXACT_MEMBERGROUP_SIZE;
+ oldxid = (TransactionId *)(oldflag + MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP);
+
+ oldxid += oldmember;
+ oldflag += oldmember;
+
+ /* Iterate through the old members */
+ for (i = oldmember;
+ i < MULTIXACT_MEMBERS_PER_GROUP && offset < next_offset;
+ i++)
+ {
+ MultiXactMembersCtxAdd(&newctx, *oldflag++, *oldxid++);
+
+ if (++offset == (uint64) 1 << 32)
+ {
+ Assert(i == MaxMultiXactOffsetOld % MULTIXACT_MEMBERS_PER_GROUP);
+ goto wraparound;
+ }
+ }
+
+ oldgroup++;
+ oldmember = 0;
+ }
+
+ oldidx = 0;
+
+ continue;
+
+wraparound:
+#define SEGNO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE / SLRU_PAGES_PER_SEGMENT
+#define PAGENO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE % SLRU_PAGES_PER_SEGMENT
+ Assert((oldseg.segno == SEGNO_MAX && oldseg.pageno == PAGENO_MAX + 1) ||
+ (oldseg.segno == SEGNO_MAX + 1 && oldseg.pageno == 0));
+
+ /* Switch to segment 0000 */
+ close_segment(&oldseg);
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+
+ /* skip invalid zero multi offset */
+ oldidx = 1;
+ }
+
+ MultiXactMembersCtxFinit(&newctx);
+
+ /* Release resources */
+ close_segment(&oldseg);
+
+ pfree(oldseg.dir);
+}
--
2.43.0
[application/octet-stream] v7-0002-Use-64-bit-multixact-offsets.patch (13.3K, 7-v7-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 8d0b3a64804ba3b0c4104cd37907e2959934937b Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v7 2/5] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..e1b3187d0b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -664,7 +664,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-11-13 15:44 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-11-15 08:41 ` Maxim Orlov <[email protected]>
1 sibling, 0 replies; 21+ messages in thread
From: Maxim Orlov @ 2024-11-15 08:41 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: wenhui qiu <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
Here is the test scripts.
The generate.sh script is used to generate data dir with multimple clusters
in it. This script will call multixids.py in order to generate data. If you
are not use system psql consider using LD_LIBRARY_PATH env to specify path
to the lib directory.
OLDBIN=/.../pgsql-new ./generate.sh
Then the test.sh is used to run various upgrades.
OLDBIN=/.../pgsql-old NEWBIN=/.../pgsql-new ./test.sh
I hope that helps!
--
Best regards,
Maxim Orlov.
Attachments:
[text/x-python-script] multixids.py (1.6K, 3-multixids.py)
download | inline:
#!/usr/bin/env python3
import sys;
import threading;
import psycopg2;
def test_multixact(tblname: str):
with psycopg2.connect(dbname="postgres") as conn:
cur = conn.cursor()
cur.execute(
f"""
DROP TABLE IF EXISTS {tblname};
CREATE TABLE {tblname}(i int primary key, n_updated int) WITH (autovacuum_enabled=false);
INSERT INTO {tblname} select g, 0 from generate_series(1, 50) g;
"""
)
# Lock entries using parallel connections in a round-robin fashion.
nclients = 50
update_every = 97
connections = []
for _ in range(nclients):
# Do not turn on autocommit. We want to hold the key-share locks.
conn = psycopg2.connect(dbname="postgres")
connections.append(conn)
# On each iteration, we commit the previous transaction on a connection,
# and issue another select. Each SELECT generates a new multixact that
# includes the new XID, and the XIDs of all the other parallel transactions.
# This generates enough traffic on both multixact offsets and members SLRUs
# to cross page boundaries.
for i in range(20000):
conn = connections[i % nclients]
conn.commit()
# Perform some non-key UPDATEs too, to exercise different multixact
# member statuses.
if i % update_every == 0:
conn.cursor().execute(f"update {tblname} set n_updated = n_updated + 1 where i = {i % 50}")
else:
conn.cursor().execute(f"select * from {tblname} for key share")
test_multixact(sys.argv[1])
[application/x-sh] generate.sh (1.9K, 4-generate.sh)
download
[application/x-sh] test.sh (1.8K, 5-test.sh)
download
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-11-13 15:44 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-11-15 11:06 ` Heikki Linnakangas <[email protected]>
2024-11-15 16:19 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
1 sibling, 1 reply; 21+ messages in thread
From: Heikki Linnakangas @ 2024-11-15 11:06 UTC (permalink / raw)
To: Maxim Orlov <[email protected]>; +Cc: wenhui qiu <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
On 13/11/2024 17:44, Maxim Orlov wrote:
> On Tue, 12 Nov 2024 at 02:31, Heikki Linnakangas <[email protected]
> <mailto:[email protected]>> wrote:
> On a different note, I'm surprised you're rewriting member segments
> from
> scratch, parsing all the individual member groups and writing them out
> again. There's no change to the members file format, except for the
> numbering of the files, so you could just copy the files under the new
> names without paying attention to the contents. It's not wrong to parse
> them in detail, but I'd assume that it would be simpler not to.
>
> Yes, at the beginning I also thought that it would be possible to get by
> with simple copying. But in case of wraparound, we must "bypass" invalid
> zero offset value. See, old 32 bit offsets a wrapped at 2^32, thus 0
> values appears in multixact.c So, they must be handled. Bypass, in fact.
> When we are switched to the 64-bit offsets, we have two options:
> 1). Bypass every ((uint32) offset == 0) value in multixact.c;
> 2). Convert members and bypass invalid value once.
>
> The first options seem too weird for me. So, we have to repack members
> and bypass invalid value.
Hmm, so if I understand correctly, this is related to how we determine
the length of the members array, by looking at the next multixid's
offset. This is explained in GetMultiXactIdMembers:
> /*
> * Find out the offset at which we need to start reading MultiXactMembers
> * and the number of members in the multixact. We determine the latter as
> * the difference between this multixact's starting offset and the next
> * one's. However, there are some corner cases to worry about:
> *
> * 1. This multixact may be the latest one created, in which case there is
> * no next one to look at. In this case the nextOffset value we just
> * saved is the correct endpoint.
> *
> * 2. The next multixact may still be in process of being filled in: that
> * is, another process may have done GetNewMultiXactId but not yet written
> * the offset entry for that ID. In that scenario, it is guaranteed that
> * the offset entry for that multixact exists (because GetNewMultiXactId
> * won't release MultiXactGenLock until it does) but contains zero
> * (because we are careful to pre-zero offset pages). Because
> * GetNewMultiXactId will never return zero as the starting offset for a
> * multixact, when we read zero as the next multixact's offset, we know we
> * have this case. We handle this by sleeping on the condition variable
> * we have just for this; the process in charge will signal the CV as soon
> * as it has finished writing the multixact offset.
> *
> * 3. Because GetNewMultiXactId increments offset zero to offset one to
> * handle case #2, there is an ambiguity near the point of offset
> * wraparound. If we see next multixact's offset is one, is that our
> * multixact's actual endpoint, or did it end at zero with a subsequent
> * increment? We handle this using the knowledge that if the zero'th
> * member slot wasn't filled, it'll contain zero, and zero isn't a valid
> * transaction ID so it can't be a multixact member. Therefore, if we
> * read a zero from the members array, just ignore it.
> *
> * This is all pretty messy, but the mess occurs only in infrequent corner
> * cases, so it seems better than holding the MultiXactGenLock for a long
> * time on every multixact creation.
> */
With 64-bit offsets, can we assume that it never wraps around? We often
treat 2^64 as "large enough that we'll never run out", e.g. LSNs are
also assumed to never wrap around. I think that would be a safe
assumption here too.
If we accept that, we don't need to worry about case 3 anymore. But if
we upgrade wrapped-around members files by just renaming them, there
could still be a members array where we had skipped offset 0, and
reading that after the upgrade might get confused. We could continue to
ignore a 0 XID in the members array like the comment says; I think that
would be enough. But yeah, maybe it's better to bite the bullet in
pg_upgrade and squeeze those out.
Does your upgrade test suite include case 3, where the next multixact's
offset is 1?
Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other
comments and checks that talk about binary-upgraded values too that we
can hopefully clean up now.
If we are to parse the member segments in detail in upgrade anyway, I'd
be tempted to make some further changes / optimizations:
- You could leave out all locking XID members in upgrade, because
they're not relevant after upgrade any more (all the XIDs will be
committed or aborted and have released the locks; we require prepared
transactions to be completed before upgrading too). It'd be enough to
include actual UPDATE/DELETE XIDs.
- The way we determine the length of the members array by looking at the
next multixid's offset is a bit complicated. We could have one extra
flag per XID in the members to indicate "this is the last member of this
multixid". That could either to replace the current mechanism of looking
at the next offset, or be just an additional cross-check.
- Do we still like the "group" representation, with 4 bytes of flags
followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes per
XID unaligned.
- A more radical idea: There can be only one updating XID in one
multixid. We could store that directly in the offsets SLRU, and keep
only the locking XIDs in members. That way, the members SLRU would
become less critical; it could be safely reset on crash for example
(except for prepared transactions, which could still be holding locks,
but it'd still be less serious). Separating correctness-critical data
from more ephemeral state is generally a good idea.
I'm not insisting on any of these changes, just some things that might
be worth considering if we're rewriting the SLRUs on upgrade anyway.
--
Heikki Linnakangas
Neon (https://neon.tech)
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-11-13 15:44 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-15 11:06 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
@ 2024-11-15 16:19 ` Maxim Orlov <[email protected]>
2024-11-18 13:22 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-11-15 16:19 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: wenhui qiu <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
On Fri, 15 Nov 2024 at 14:06, Heikki Linnakangas <[email protected]> wrote:
> Hmm, so if I understand correctly, this is related to how we determine
> the length of the members array, by looking at the next multixid's
> offset. This is explained in GetMultiXactIdMembers:
>
Correct.
> If we accept that, we don't need to worry about case 3 anymore. But if
> we upgrade wrapped-around members files by just renaming them, there
> could still be a members array where we had skipped offset 0, and
> reading that after the upgrade might get confused. We could continue to
> ignore a 0 XID in the members array like the comment says; I think that
> would be enough. But yeah, maybe it's better to bite the bullet in
> pg_upgrade and squeeze those out.
>
Correct. I couldn't explain this better. I'm more for the squeeze those
out. Overwise, we're ending up in adding another hack in multixact, but one
of the benefits from switching to 64-bits, it should make XID's logic more
straight forward. After all, mxact juggling in pg_upgrade is one time
inconvenience.
>
> Does your upgrade test suite include case 3, where the next multixact's
> offset is 1?
>
Not exactly.
simple
Latest checkpoint's NextMultiXactId: 119441
Latest checkpoint's NextMultiOffset: 5927049
offset-wrap
Latest checkpoint's NextMultiXactId: 119441
Latest checkpoint's NextMultiOffset: 5591183
multi-wrap
Latest checkpoint's NextMultiXactId: 82006
Latest checkpoint's NextMultiOffset: 7408811
offset-multi-wrap
Latest checkpoint's NextMultiXactId: 52146
Latest checkpoint's NextMultiOffset: 5591183
You want test case where NextMultiOffset will be 1?
> Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other
> comments and checks that talk about binary-upgraded values too that we
> can hopefully clean up now.
>
Yes, technically we can. But this is kinda unrelated to the offsets and
will make the patch set significantly complicated, thus more complicated to
review and less likely to be committed. Again, I'm not opposing the idea,
I'm not sure if it is worth to do it right now.
>
> If we are to parse the member segments in detail in upgrade anyway, I'd
> be tempted to make some further changes / optimizations:
>
> - You could leave out all locking XID members in upgrade, because
> they're not relevant after upgrade any more (all the XIDs will be
> committed or aborted and have released the locks; we require prepared
> transactions to be completed before upgrading too). It'd be enough to
> include actual UPDATE/DELETE XIDs.
>
> - The way we determine the length of the members array by looking at the
> next multixid's offset is a bit complicated. We could have one extra
> flag per XID in the members to indicate "this is the last member of this
> multixid". That could either to replace the current mechanism of looking
> at the next offset, or be just an additional cross-check.
>
> - Do we still like the "group" representation, with 4 bytes of flags
> followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes per
> XID unaligned.
>
Not really. But I would leave it for next iteration - switching multi to 64
bit. I already have some drafts for this. In any case, we'll must do
adjustments in pg_upgrade again. My goal is to move towards 64 XIDs, but
with the small steps, and I plan changes in "group" representation in
combination with switching multi to 64 bit. This seems a bit more
appropriate in my view.
As for your optimization suggestions, I like them. I don’t against them,
but I’m afraid to disrupt the clarity of thought, especially since the
algorithm is not the simplest.
--
Best regards,
Maxim Orlov.
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-11-13 15:44 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-15 11:06 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-11-15 16:19 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-11-18 13:22 ` Maxim Orlov <[email protected]>
2024-11-19 17:53 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Orlov @ 2024-11-18 13:22 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: wenhui qiu <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
Shame on me! I've sent an erroneous patch set. Version 7 is defective. Here
is the proper version v8 with minor refactoring in segresize.c.
Also, I rename bump cat version patch into txt in order not to break cfbot.
--
Best regards,
Maxim Orlov.
From 73b8663093ff1c58def9a80abab142a12c993bf6 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 13 Nov 2024 16:34:34 +0300
Subject: [PATCH v8 5/5] TEST: bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/include/catalog/catversion.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..18faedc963 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411112
/*
* large object chunk size added to pg_controldata,
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5dd91e190a..3d09caf5ae 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411111
+#define CATALOG_VERSION_NO 202411112
#endif
--
2.43.0
Attachments:
[text/plain] v8-0005-TEST-bump-catver.patch.txt (1.1K, 3-v8-0005-TEST-bump-catver.patch.txt)
download | inline diff:
From 73b8663093ff1c58def9a80abab142a12c993bf6 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 13 Nov 2024 16:34:34 +0300
Subject: [PATCH v8 5/5] TEST: bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/include/catalog/catversion.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..18faedc963 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411112
/*
* large object chunk size added to pg_controldata,
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5dd91e190a..3d09caf5ae 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411111
+#define CATALOG_VERSION_NO 202411112
#endif
--
2.43.0
[application/octet-stream] v8-0002-Use-64-bit-multixact-offsets.patch (13.3K, 4-v8-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From ad9a1509fd5cd68838169b3465ab4c5f9827a4e3 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v8 2/5] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 172 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 11 insertions(+), 169 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..c51e03e832 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1976,7 +1891,7 @@ MultiXactShmemInit(void)
"pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
LWTRANCHE_MULTIXACTOFFSET_SLRU,
SYNC_HANDLER_MULTIXACT_OFFSET,
- false);
+ true);
SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
SimpleLruInit(MultiXactMemberCtl,
"multixact_member", multixact_member_buffers, 0,
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..e1b3187d0b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -664,7 +664,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
[application/x-sh] generate.sh (2.1K, 5-generate.sh)
download
[application/x-sh] test.sh (2.1K, 6-test.sh)
download
[text/x-python-script] multixids.py (1.6K, 7-multixids.py)
download | inline:
#!/usr/bin/env python3
import sys;
import threading;
import psycopg2;
def test_multixact(tblname: str):
with psycopg2.connect(dbname="postgres") as conn:
cur = conn.cursor()
cur.execute(
f"""
DROP TABLE IF EXISTS {tblname};
CREATE TABLE {tblname}(i int primary key, n_updated int) WITH (autovacuum_enabled=false);
INSERT INTO {tblname} select g, 0 from generate_series(1, 50) g;
"""
)
# Lock entries using parallel connections in a round-robin fashion.
nclients = 50
update_every = 97
connections = []
for _ in range(nclients):
# Do not turn on autocommit. We want to hold the key-share locks.
conn = psycopg2.connect(dbname="postgres")
connections.append(conn)
# On each iteration, we commit the previous transaction on a connection,
# and issue another select. Each SELECT generates a new multixact that
# includes the new XID, and the XIDs of all the other parallel transactions.
# This generates enough traffic on both multixact offsets and members SLRUs
# to cross page boundaries.
for i in range(20000):
conn = connections[i % nclients]
conn.commit()
# Perform some non-key UPDATEs too, to exercise different multixact
# member statuses.
if i % update_every == 0:
conn.cursor().execute(f"update {tblname} set n_updated = n_updated + 1 where i = {i % 50}")
else:
conn.cursor().execute(f"select * from {tblname} for key share")
test_multixact(sys.argv[1])
[application/octet-stream] v8-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 8-v8-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From fdb7e2eee33dfb5df714d8d16112d4c907475d78 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v8 1/5] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 05c738d661..727b6e744f 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -876,8 +876,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.43.0
[application/octet-stream] v8-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (8.8K, 9-v8-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From edff3857e1cb6c67e75be1b00fd5da1cd4bde343 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 23 Oct 2024 18:23:39 +0300
Subject: [PATCH v8 4/5] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Instead, switch to MULTIXACT_MEMBER_AUTOVAC_THRESHOLD (eq 2^32) members
threshold. It is used to determine if we need to force autovacuum or not.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 117 +++----------------------
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/include/access/multixact.h | 1 -
4 files changed, 15 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c51e03e832..c1f228c5fb 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -204,10 +204,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuumin order to release the disk space, reduce table bloat if
+ * possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -2616,15 +2620,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2712,10 +2714,10 @@ SetOffsetVacuumLimit(bool is_startup)
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2761,101 +2763,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 86f36b3695..e7506e268a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,7 +1133,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..180bb7e96e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1915,7 +1915,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v8-0003-Make-pg_upgrade-convert-multixact-offsets.patch (18.3K, 10-v8-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From c324315152346d7f2090aaf79b142726aa2486ae Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v8 3/5] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
Author: Yura Sokolov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 42 ++-
src/bin/pg_upgrade/pg_upgrade.h | 14 +-
src/bin/pg_upgrade/segresize.c | 541 ++++++++++++++++++++++++++++++++
5 files changed, 594 insertions(+), 5 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..1654e877c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,8 +750,42 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
- copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ MultiXactOffset oldest_offset,
+ next_offset;
+
+ remove_new_subdir("pg_multixact/offsets", false);
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+ oldest_offset = convert_multixact_offsets();
+ check_ok();
+
+ remove_new_subdir("pg_multixact/members", false);
+ prep_status("Converting pg_multixact/members");
+ convert_multixact_members(oldest_offset);
+ check_ok();
+
+ next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+ if (oldest_offset)
+ {
+ if (next_offset < oldest_offset)
+ next_offset += ((MultiXactOffset) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ {
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ }
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +794,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..2c85ec1e94 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,8 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+MultiXactOffset convert_multixact_offsets(void);
+void convert_multixact_members(MultiXactOffset oldest_offset);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..2f6f3b3288
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,541 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+ bool long_segment_names;
+} SlruSegState;
+
+/*
+ * Mirrors the SlruFileName from slru.c
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ if (state->long_segment_names)
+ {
+ Assert(state->segno >= 0 &&
+ state->segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+ return psprintf("%s/%015llX", state->dir, (long long) state->segno);
+ }
+ else
+ {
+ Assert(state->segno >= 0 && state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+ }
+}
+
+/*
+ * Create new SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+}
+
+typedef uint32 MultiXactOffsetOld;
+
+#define MaxMultiXactOffsetOld ((MultiXactOffsetOld) 0xFFFFFFFF)
+
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(MultiXactOffsetOld))
+#define MULTIXACT_OFFSETS_PER_PAGE_NEW (BLCKSZ / sizeof(MultiXactOffset))
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+MultiXactOffset
+convert_multixact_offsets(void)
+{
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ MultiXactOffsetOld oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE_NEW] = {0},
+ oldest_offset = 0;
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool oldest_offset_known = false;
+
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ oldseg.long_segment_names = false;
+
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ newseg.long_segment_names = true;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ /* Copy multi offsets reading only needed segment pages */
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound */
+#define OLD_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_OLD / SLRU_PAGES_PER_SEGMENT)
+ if (oldseg.segno > OLD_OFFSET_SEGNO_MAX)
+ {
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+ }
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Save oldest multi offset */
+ if (!oldest_offset_known)
+ {
+ oldest_offset = oldbuf[old_entry];
+ oldest_offset_known = true;
+ }
+
+ /* Skip wrapped-around invalid MultiXactIds */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+ Assert(new_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ new_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound (1 becomes 2^32) */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1 */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE_NEW)
+ {
+ /* Handle possible segment wraparound */
+#define NEW_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_NEW / SLRU_PAGES_PER_SEGMENT)
+ if (newseg.segno > NEW_OFFSET_SEGNO_MAX)
+ {
+ newseg.segno = 0;
+ newseg.pageno = 0;
+ }
+
+ /* Write new page */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE_NEW - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Use next_offset as oldest_offset, if oldest_multi == next_multi */
+ if (!oldest_offset_known)
+ {
+ Assert(oldest_multi == next_multi);
+ oldest_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ }
+
+ /* Release resources */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ return oldest_offset;
+}
+
+#define MXACT_MEMBERS_FLAG_BYTES 1
+
+#define MULTIXACT_MEMBERS_PER_GROUP 4
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (MULTIXACT_MEMBERS_PER_GROUP * (sizeof(TransactionId) + MXACT_MEMBERS_FLAG_BYTES))
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE \
+ (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERS_PER_GROUP * MULTIXACT_MEMBERGROUPS_PER_PAGE)
+#define MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP \
+ (MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP)
+
+typedef struct MultiXactMembersCtx
+{
+ SlruSegState seg;
+ char buf[BLCKSZ];
+ int group;
+ int member;
+ char *flag;
+ TransactionId *xid;
+} MultiXactMembersCtx;
+
+static void
+MultiXactMembersCtxInit(MultiXactMembersCtx *ctx)
+{
+ ctx->seg.dir = psprintf("%s/pg_multixact/members", new_cluster.pgdata);
+ ctx->seg.long_segment_names = false;
+
+ ctx->group = 0;
+ ctx->member = 1; /* skip invalid zero offset */
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+
+ ctx->flag += ctx->member;
+ ctx->xid += ctx->member;
+}
+
+static void
+MultiXactMembersCtxAdd(MultiXactMembersCtx *ctx, char flag, TransactionId xid)
+{
+ /* Copy member's xid and flags to the new page */
+ *ctx->flag++ = flag;
+ *ctx->xid++ = xid;
+
+ if (++ctx->member < MULTIXACT_MEMBERS_PER_GROUP)
+ return;
+
+ /* Start next member group */
+ ctx->member = 0;
+
+ if (++ctx->group >= MULTIXACT_MEMBERGROUPS_PER_PAGE)
+ {
+ /* Write current page and start new */
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ ctx->group = 0;
+ memset(ctx->buf, 0, BLCKSZ);
+ }
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+}
+
+static void
+MultiXactMembersCtxFinit(MultiXactMembersCtx *ctx)
+{
+ if (ctx->flag > (char *) ctx->buf)
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ close_segment(&ctx->seg);
+
+ pfree(ctx->seg.dir);
+}
+
+/*
+ * Convert pg_multixact/members segments, offsets will start from 1.
+ *
+ */
+void
+convert_multixact_members(MultiXactOffset oldest_offset)
+{
+ MultiXactOffset next_offset,
+ offset;
+ SlruSegState oldseg = {0};
+ char oldbuf[BLCKSZ] = {0};
+ int oldidx;
+ MultiXactMembersCtx newctx = {0};
+
+ oldseg.dir = psprintf("%s/pg_multixact/members", old_cluster.pgdata);
+
+ next_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ /* Initialize the old starting position */
+ oldseg.pageno = oldest_offset / MULTIXACT_MEMBERS_PER_PAGE;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+ oldseg.long_segment_names = false;
+
+ /* Initialize new starting position */
+ MultiXactMembersCtxInit(&newctx);
+
+ /* Iterate through the original directory */
+ oldidx = oldest_offset % MULTIXACT_MEMBERS_PER_PAGE;
+ for (offset = oldest_offset; offset < next_offset;)
+ {
+ bool empty;
+ int oldlen;
+ int ngroups;
+ int oldgroup;
+ int oldmember;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Iterate through the old member groups */
+ ngroups = oldlen / MULTIXACT_MEMBERGROUP_SIZE;
+ oldmember = oldidx % MULTIXACT_MEMBERS_PER_GROUP;
+ oldgroup = oldidx / MULTIXACT_MEMBERS_PER_GROUP;
+ while (oldgroup < ngroups && offset < next_offset)
+ {
+ char *oldflag;
+ TransactionId *oldxid;
+ int i;
+
+ oldflag = (char *) oldbuf + oldgroup * MULTIXACT_MEMBERGROUP_SIZE;
+ oldxid = (TransactionId *)(oldflag + MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP);
+
+ oldxid += oldmember;
+ oldflag += oldmember;
+
+ /* Iterate through the old members */
+ for (i = oldmember;
+ i < MULTIXACT_MEMBERS_PER_GROUP && offset < next_offset;
+ i++)
+ {
+ MultiXactMembersCtxAdd(&newctx, *oldflag++, *oldxid++);
+
+ if (++offset == (uint64) 1 << 32)
+ {
+ Assert(i == MaxMultiXactOffsetOld % MULTIXACT_MEMBERS_PER_GROUP);
+ goto wraparound;
+ }
+ }
+
+ oldgroup++;
+ oldmember = 0;
+ }
+
+ oldidx = 0;
+
+ continue;
+
+wraparound:
+#define SEGNO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE / SLRU_PAGES_PER_SEGMENT
+#define PAGENO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE % SLRU_PAGES_PER_SEGMENT
+ Assert((oldseg.segno == SEGNO_MAX && oldseg.pageno == PAGENO_MAX + 1) ||
+ (oldseg.segno == SEGNO_MAX + 1 && oldseg.pageno == 0));
+
+ /* Switch to segment 0000 */
+ close_segment(&oldseg);
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+
+ /* skip invalid zero multi offset */
+ oldidx = 1;
+ }
+
+ MultiXactMembersCtxFinit(&newctx);
+
+ /* Release resources */
+ close_segment(&oldseg);
+
+ pfree(oldseg.dir);
+}
--
2.43.0
^ permalink raw reply [nested|flat] 21+ messages in thread
* Re: POC: make mxidoff 64 bits
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Re: POC: make mxidoff 64 bits Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-22 09:43 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-10-25 03:38 ` Re: POC: make mxidoff 64 bits wenhui qiu <[email protected]>
2024-11-08 18:10 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-11-13 15:44 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-15 11:06 ` Re: POC: make mxidoff 64 bits Heikki Linnakangas <[email protected]>
2024-11-15 16:19 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-11-18 13:22 ` Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
@ 2024-11-19 17:53 ` Maxim Orlov <[email protected]>
0 siblings, 0 replies; 21+ messages in thread
From: Maxim Orlov @ 2024-11-19 17:53 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: wenhui qiu <[email protected]>; Alexander Korotkov <[email protected]>; Postgres hackers <[email protected]>
Oops! Sorry for the noise. I've must have been overworking yesterday and
messed up the working branches. v7 was a correct set and v8 don't. Here is
the correction with extended Perl test.
The test itself is in src/bin/pg_upgrade/t/005_offset.pl It is rather heavy
and took about 45 minutes on my i5 with 2.7 Gb data generated. Basically,
each test here is creating a cluster and fill it with multixacts. Thus,
dozens of segments are created using two methods. One is with prepared
transactions, and it creates, roughly, the same amount of segments for
members and for offsets. The other one is based on Heikki's multixids.py
and creates more members than offsets. I've used both of these methods to
generate as much diverse data as possible.
Here is how I test this patch set:
1. You need two pg clusters: the "old" one, i.e. without patch set, and
the "new" with patch set v9 applied.
2. Apply v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt
to the "old" and "new" clusters. Note, this is only patch required for
"old" cluster. This will allow you to create a cluster with non-standard
initial multixact and multixact offset. Unfortunately, this patch was not
did not arouse public interest since it is assumed that there is similar
functionality to the pg_resetwal utility. But similar is not mean equal.
See, pg_resetwal must be used after cluster init, thus, we step into some
problems with vacuum and some SLRU segments must be filled with zeroes.
Also, template0 datminmxid must be manually updated. So, in me view,
using this patch is justified and very handy here.
3. Also, apply all the "TEST" (0006 and 0007) patches to the "new"
cluster.
4. Build "old" and "new" pg clusters.
5. Run the test with: PROVE_TESTS=t/005_offset.pl PG_TEST_NOCLEAN=1
oldinstall=/home/orlov/proj/OFFSET3/pgsql-old make check -s -C
src/bin/pg_upgrade/
6. In my case, it took around 45 minutes and generate roughly 2.7 Gb of
data.
"TEST" patches, of course, are for the test purposes and not to be
committed.
In src/bin/pg_upgrade/t/005_offset.pl I try to consider next cases:
- Basic sanity checks.
Here I test various initial multi and offset values (including
wraparound) and see how appropriate segments are generated.
- pg_upgarde tests.
Here is oldinstall ENV is for. Run pg_upgrade for old cluster with multi
and offset values just like in previous step. i.e. with various
combinations.
- Self pg_upgarde.
--
Best regards,
Maxim Orlov.
From 2642f597832cbed0ebc54202de4e0f5770ac5f50 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v9 5/7] TEST: initdb option to initialize cluster with
non-standard xid/mxid/mxoff
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%4...
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 ++++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++--
src/backend/bootstrap/bootstrap.c | 50 +++++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 +++-
src/backend/tcop/postgres.c | 53 +++++++++++-
src/bin/initdb/initdb.c | 107 ++++++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++++
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
13 files changed, 382 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a817f539ee..095c39dd93 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -1978,7 +1999,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index d31a67599c..8c33b8ba9d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -217,7 +217,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -272,12 +272,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 78e66a06ac..483307279f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -572,7 +572,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -669,10 +669,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -723,6 +731,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 184b830168..4fd594cfe5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..410868dddf 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index e1b3187d0b..f770e9a140 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
--
2.43.0
From 33e21cf86b1813a67c699d703ab1f75bcf28a7b1 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 13 Nov 2024 16:34:34 +0300
Subject: [PATCH v9 7/7] TEST: bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/include/catalog/catversion.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..18faedc963 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411112
/*
* large object chunk size added to pg_controldata,
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5dd91e190a..3d09caf5ae 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411111
+#define CATALOG_VERSION_NO 202411112
#endif
--
2.43.0
From 3558ccb4712d50bcda877474db5c9fd124b6e919 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 19 Nov 2024 17:08:10 +0300
Subject: [PATCH v9 6/7] TEST: add src/bin/pg_upgrade/t/005_offset.pl
---
src/bin/pg_upgrade/t/005_offset.pl | 562 +++++++++++++++++++++++++++++
1 file changed, 562 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/005_offset.pl
diff --git a/src/bin/pg_upgrade/t/005_offset.pl b/src/bin/pg_upgrade/t/005_offset.pl
new file mode 100644
index 0000000000..1cfd8b364a
--- /dev/null
+++ b/src/bin/pg_upgrade/t/005_offset.pl
@@ -0,0 +1,562 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# This pair of calls will create significantly more member segments than offset
+# segments.
+sub prep
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+}
+
+sub fill
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ my $nclients = 50;
+ my $update_every = 90;
+ my @connections = ();
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres');
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ for (my $i = 0; $i < 20000; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} FOR KEY SHARE");
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+}
+
+# This pair of calls will create more or less the same amount of membsers and
+# offsets segments.
+sub prep2
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl}(BAR INT PRIMARY KEY, BAZ INT); " .
+ "CREATE OR REPLACE PROCEDURE MXIDFILLER(N_STEPS INT DEFAULT 1000) " .
+ "LANGUAGE PLPGSQL " .
+ "AS \$\$ " .
+ "BEGIN " .
+ " FOR I IN 1..N_STEPS LOOP " .
+ " UPDATE ${tbl} SET BAZ = RANDOM(1, 1000) " .
+ " WHERE BAR IN (SELECT BAR FROM ${tbl} " .
+ " TABLESAMPLE BERNOULLI(80)); " .
+ " COMMIT; " .
+ " END LOOP; " .
+ "END; \$\$; " .
+ "INSERT INTO ${tbl} (BAR, BAZ) " .
+ "SELECT ID, ID FROM GENERATE_SERIES(1, 1024) ID;");
+}
+
+sub fill2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ $node->safe_psql('postgres',
+ "BEGIN; " .
+ "SELECT * FROM ${tbl} FOR KEY SHARE; " .
+ "PREPARE TRANSACTION 'A'; " .
+ "CALL MXIDFILLER((365 * ${scale})::int); " .
+ "COMMIT PREPARED 'A';");
+}
+
+
+# generate around 2 offset segments and 55 member segments
+sub mxid_gen1
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ prep($node, $tbl);
+ fill($node, $tbl);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# generate around 10 offset segments and 12 member segments
+sub mxid_gen2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ prep2($node, $tbl);
+ fill2($node, $tbl, $scale);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# Fetch latest multixact checkpoint values.
+sub multi_bounds
+{
+ my ($node) = @_;
+ my $path = $node->config_data('--bindir');
+ my ($stdout, $stderr) = run_command([
+ $path . '/pg_controldata',
+ $node->data_dir
+ ]);
+ my @control_data = split("\n", $stdout);
+ my $next = undef;
+ my $oldest = undef;
+ my $next_offset = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiXactId:\s*(.*)$/mg)
+ {
+ $next = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's oldestMultiXid:\s*(.*)$/mg)
+ {
+ $oldest = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+ {
+ $next_offset = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if (defined($oldest) && defined($next) && defined($next_offset))
+ {
+ last;
+ }
+ }
+
+ die "Latest checkpoint's NextMultiXactId not found in control file!\n"
+ unless defined($next);
+
+ die "Latest checkpoint's oldestMultiXid not found in control file!\n"
+ unless defined($oldest);
+
+ die "Latest checkpoint's NextMultiOffset not found in control file!\n"
+ unless defined($next_offset);
+
+ return ($oldest, $next, $next_offset);
+}
+
+# Create node from existing bins.
+sub create_new_node
+{
+ my ($name, %params) = @_;
+
+ create_node(0, @_);
+}
+
+# Create node from ENV oldinstall
+sub create_old_node
+{
+ my ($name, %params) = @_;
+
+ if (!defined($ENV{oldinstall}))
+ {
+ die "oldinstall is not defined";
+ }
+
+ create_node(1, @_);
+}
+
+sub create_node
+{
+ my ($install_path_from_env, $name, %params) = @_;
+ my $scale = defined $params{scale} ? $params{scale} : 1;
+ my $multi = defined $params{multi} ? $params{multi} : undef;
+ my $offset = defined $params{offset} ? $params{offset} : undef;
+
+ my $node =
+ $install_path_from_env ?
+ PostgreSQL::Test::Cluster->new($name,
+ install_path => $ENV{oldinstall}) :
+ PostgreSQL::Test::Cluster->new($name);
+
+ $node->init(force_initdb => 1,
+ extra => [
+ $multi ? ('-m', $multi) : (),
+ $offset ? ('-o', $offset) : (),
+ ]);
+
+ # Fixup MOX patch quirk
+ if ($multi)
+ {
+ unlink $node->data_dir . '/pg_multixact/offsets/0000';
+ }
+ if ($offset)
+ {
+ unlink $node->data_dir . '/pg_multixact/members/0000';
+ }
+
+ $node->append_conf('fsync', 'off');
+ $node->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+
+ $node->start();
+ mxid_gen2($node, 'FOO', $scale);
+ mxid_gen1($node, 'BAR', $scale);
+ $node->restart();
+ $node->safe_psql('postgres', q(SELECT * FROM FOO)); # just in case...
+ $node->safe_psql('postgres', q(SELECT * FROM BAR));
+ $node->safe_psql('postgres', q(CHECKPOINT));
+ $node->stop();
+
+ return $node;
+}
+
+sub do_upgrade
+{
+ my ($oldnode, $newnode) = @_;
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--check'
+ ],
+ 'run of pg_upgrade');
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ $oldnode->start();
+ $newnode->start();
+
+ my $oldfoo = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newfoo = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldfoo, $newfoo, "select foo eq");
+
+ my $oldbar = $oldnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ my $newbar = $newnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ is($oldbar, $newbar, "select bar eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+
+ multi_bounds($oldnode);
+ multi_bounds($newnode);
+}
+
+my @TESTS = (
+ # tests without ENV oldinstall
+ 0, 1, 2, 3, 4, 5, 6,
+ # tests with "real" pg_upgrade
+ 100, 101, 102, 103, 104, 105, 106,
+ # self upgrade
+ 1000,
+);
+
+# =============================================================================
+# Basic sanity tests on a NEW bin
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 0;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mo',
+ scale => 1);
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 1;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo',
+ scale => 1.15,
+ multi => '0x123400');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 2;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO',
+ scale => 1.15,
+ offset => '0x432100');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 3;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO',
+ scale => 1.15,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 4;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 5;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO_wrap',
+ scale => 1.15,
+ offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 6;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# pg_upgarde tests
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 100;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 101;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0x123400');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 102;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0x432100');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 103;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 104;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 105;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 106;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# Self upgrade
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 1000;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'self_upgrade';
+ my $oldnode = create_new_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+done_testing();
--
2.43.0
Attachments:
[text/plain] v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt (25.4K, 3-v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt)
download | inline diff:
From 2642f597832cbed0ebc54202de4e0f5770ac5f50 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v9 5/7] TEST: initdb option to initialize cluster with
non-standard xid/mxid/mxoff
To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.
Author: Maxim Orlov <[email protected]>
Author: Pavel Borisov <[email protected]>
Author: Svetlana Derevyanko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
src/backend/access/transam/clog.c | 21 +++++
src/backend/access/transam/multixact.c | 53 ++++++++++++
src/backend/access/transam/subtrans.c | 8 +-
src/backend/access/transam/xlog.c | 15 ++--
src/backend/bootstrap/bootstrap.c | 50 +++++++++++-
src/backend/main/main.c | 6 ++
src/backend/postmaster/postmaster.c | 14 +++-
src/backend/tcop/postgres.c | 53 +++++++++++-
src/bin/initdb/initdb.c | 107 ++++++++++++++++++++++++-
src/bin/initdb/t/001_initdb.pl | 60 ++++++++++++++
src/include/access/xlog.h | 3 +
src/include/c.h | 4 +
src/include/catalog/pg_class.h | 2 +-
13 files changed, 382 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index e6f79320e9..17e29f4497 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
{
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+ int64 pageno;
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the commit log */
+ slotno = ZeroCLOGPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(XactCtl, slotno);
+ Assert(!XactCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a817f539ee..095c39dd93 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
{
int slotno;
LWLock *lock;
+ int64 pageno;
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the offsets log */
+ slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+ Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -1978,7 +1999,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+ if (pageno != 0)
+ {
+ LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+
+ if (nextlock != lock)
+ {
+ LWLockRelease(lock);
+ LWLockAcquire(nextlock, LW_EXCLUSIVE);
+ lock = nextlock;
+ }
+
+ /* Create and zero the first page of the members log */
+ slotno = ZeroMultiXactMemberPage(pageno, false);
+
+ /* Make sure it's written out */
+ SimpleLruWritePage(MultiXactMemberCtl, slotno);
+ Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+ }
+
LWLockRelease(lock);
+
+ /*
+ * If we're starting not from zero offset, initilize dummy multixact to
+ * evade too long loop in PerformMembersTruncation().
+ */
+ if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+ {
+ RecordNewMultiXact(FirstMultiXactId,
+ MultiXactState->nextOffset, 0, NULL);
+ RecordNewMultiXact(MultiXactState->nextMXact,
+ MultiXactState->nextOffset, 0, NULL);
+ }
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 50bb1d8cfc..a5e6e8f090 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -270,12 +270,16 @@ void
BootStrapSUBTRANS(void)
{
int slotno;
- LWLock *lock = SimpleLruGetBankLock(SubTransCtl, 0);
+ LWLock *lock;
+ int64 pageno;
+
+ pageno = TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+ lock = SimpleLruGetBankLock(SubTransCtl, pageno);
LWLockAcquire(lock, LW_EXCLUSIVE);
/* Create and zero the first page of the subtrans log */
- slotno = ZeroSUBTRANSPage(0);
+ slotno = ZeroSUBTRANSPage(pageno);
/* Make sure it's written out */
SimpleLruWritePage(SubTransCtl, slotno);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f58412bca..c61d7d967c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -136,6 +136,10 @@ int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
+TransactionId start_xid = FirstNormalTransactionId;
+MultiXactId start_mxid = FirstMultiXactId;
+MultiXactOffset start_mxoff = 0;
+
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
#endif
@@ -5080,13 +5084,14 @@ BootStrapXLOG(uint32 data_checksum_version)
checkPoint.fullPageWrites = fullPageWrites;
checkPoint.wal_level = wal_level;
checkPoint.nextXid =
- FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
+ FullTransactionIdFromEpochAndXid(0, Max(FirstNormalTransactionId,
+ start_xid));
checkPoint.nextOid = FirstGenbkiObjectId;
- checkPoint.nextMulti = FirstMultiXactId;
- checkPoint.nextMultiOffset = 0;
- checkPoint.oldestXid = FirstNormalTransactionId;
+ checkPoint.nextMulti = Max(FirstMultiXactId, start_mxid);
+ checkPoint.nextMultiOffset = start_mxoff;
+ checkPoint.oldestXid = XidFromFullTransactionId(checkPoint.nextXid);
checkPoint.oldestXidDB = Template1DbOid;
- checkPoint.oldestMulti = FirstMultiXactId;
+ checkPoint.oldestMulti = checkPoint.nextMulti;
checkPoint.oldestMultiDB = Template1DbOid;
checkPoint.oldestCommitTsXid = InvalidTransactionId;
checkPoint.newestCommitTsXid = InvalidTransactionId;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index d31a67599c..8c33b8ba9d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -217,7 +217,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
argv++;
argc--;
- while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fkm:o:r:X:x:-:")) != -1)
{
switch (flag)
{
@@ -272,12 +272,60 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'k':
bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
case 'r':
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
SetConfigOption("wal_segment_size", optarg, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid value")));
+ }
+ }
+ break;
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..6a3224bb82 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -358,12 +358,18 @@ help(const char *progname)
printf(_(" -E echo statement before execution\n"));
printf(_(" -j do not use newline as interactive query delimiter\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nOptions for bootstrapping mode:\n"));
printf(_(" --boot selects bootstrapping mode (must be first argument)\n"));
printf(_(" --check selects check mode (must be first argument)\n"));
printf(_(" DBNAME database name (mandatory argument in bootstrapping mode)\n"));
printf(_(" -r FILENAME send stdout and stderr to given file\n"));
+ printf(_(" -m START_MXID set initial database cluster multixact id\n"));
+ printf(_(" -o START_MXOFF set initial database cluster multixact offset\n"));
+ printf(_(" -x START_XID set initial database cluster xid\n"));
printf(_("\nPlease read the documentation for the complete list of run-time\n"
"configuration settings and how to set them on the command line or in\n"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 78e66a06ac..483307279f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -572,7 +572,7 @@ PostmasterMain(int argc, char *argv[])
* tcop/postgres.c (the option sets should not conflict) and with the
* common help() function in main/main.c.
*/
- while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:OPp:r:S:sTt:W:-:")) != -1)
+ while ((opt = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:Oo:Pp:r:S:sTt:W:x:-:")) != -1)
{
switch (opt)
{
@@ -669,10 +669,18 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("max_connections", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'm':
+ /* only used by single-user backend */
+ break;
+
case 'O':
SetConfigOption("allow_system_table_mods", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'o':
+ /* only used by single-user backend */
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", PGC_POSTMASTER, PGC_S_ARGV);
break;
@@ -723,6 +731,10 @@ PostmasterMain(int argc, char *argv[])
SetConfigOption("post_auth_delay", optarg, PGC_POSTMASTER, PGC_S_ARGV);
break;
+ case 'x':
+ /* only used by single-user backend */
+ break;
+
default:
write_stderr("Try \"%s --help\" for more information.\n",
progname);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 184b830168..4fd594cfe5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3918,7 +3918,7 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
* postmaster/postmaster.c (the option sets should not conflict) and with
* the common help() function in main/main.c.
*/
- while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lN:nOPp:r:S:sTt:v:W:-:")) != -1)
+ while ((flag = getopt(argc, argv, "B:bC:c:D:d:EeFf:h:ijk:lm:N:nOo:Pp:r:S:sTt:v:W:x:-:")) != -1)
{
switch (flag)
{
@@ -4010,6 +4010,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("ssl", "true", ctx, gucsource);
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact id")));
+ }
+ }
+ break;
+
case 'N':
SetConfigOption("max_connections", optarg, ctx, gucsource);
break;
@@ -4022,6 +4039,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("allow_system_table_mods", "true", ctx, gucsource);
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster multixact offset")));
+ }
+ }
+ break;
+
case 'P':
SetConfigOption("ignore_system_indexes", "true", ctx, gucsource);
break;
@@ -4076,6 +4110,23 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx,
SetConfigOption("post_auth_delay", optarg, ctx, gucsource);
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid initial database cluster xid")));
+ }
+ }
+ break;
+
default:
errs++;
break;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783..410868dddf 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,9 @@ static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static TransactionId start_xid = 0;
+static MultiXactId start_mxid = 0;
+static MultiXactOffset start_mxoff = 0;
/* internal vars */
@@ -1568,6 +1571,11 @@ bootstrap_template1(void)
bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes_bki(username));
+ /* relfrozenxid must not be less than FirstNormalTransactionId */
+ sprintf(buf, "%llu", (unsigned long long) Max(start_xid, 3));
+ bki_lines = replace_token(bki_lines, "RECENTXMIN",
+ buf);
+
bki_lines = replace_token(bki_lines, "ENCODING",
encodingid_to_string(encodingid));
@@ -1593,6 +1601,9 @@ bootstrap_template1(void)
printfPQExpBuffer(&cmd, "\"%s\" --boot %s %s", backend_exec, boot_options, extra_options);
appendPQExpBuffer(&cmd, " -X %d", wal_segment_size_mb * (1024 * 1024));
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
if (data_checksums)
appendPQExpBuffer(&cmd, " -k");
if (debug)
@@ -2532,12 +2543,20 @@ usage(const char *progname)
printf(_(" -d, --debug generate lots of debugging output\n"));
printf(_(" --discard-caches set debug_discard_caches=1\n"));
printf(_(" -L DIRECTORY where to find the input files\n"));
+ printf(_(" -m, --multixact-id=START_MXID\n"
+ " set initial database cluster multixact id\n"
+ " max value is 2^62-1\n"));
printf(_(" -n, --no-clean do not clean up after errors\n"));
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
printf(_(" --no-instructions do not print instructions for next steps\n"));
+ printf(_(" -o, --multixact-offset=START_MXOFF\n"
+ " set initial database cluster multixact offset\n"
+ " max value is 2^62-1\n"));
printf(_(" -s, --show show internal settings, then exit\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
printf(_(" -S, --sync-only only sync database files to disk, then exit\n"));
+ printf(_(" -x, --xid=START_XID set initial database cluster xid\n"
+ " max value is 2^62-1\n"));
printf(_("\nOther options:\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
@@ -3079,6 +3098,18 @@ initialize_data_directory(void)
/* Now create all the text config files */
setup_config();
+ if (start_mxid != 0)
+ printf(_("selecting initial multixact id ... %llu\n"),
+ (unsigned long long) start_mxid);
+
+ if (start_mxoff != 0)
+ printf(_("selecting initial multixact offset ... %llu\n"),
+ (unsigned long long) start_mxoff);
+
+ if (start_xid != 0)
+ printf(_("selecting initial xid ... %llu\n"),
+ (unsigned long long) start_xid);
+
/* Bootstrap template1 */
bootstrap_template1();
@@ -3095,8 +3126,12 @@ initialize_data_directory(void)
fflush(stdout);
initPQExpBuffer(&cmd);
- printfPQExpBuffer(&cmd, "\"%s\" %s %s template1 >%s",
- backend_exec, backend_options, extra_options, DEVNULL);
+ printfPQExpBuffer(&cmd, "\"%s\" %s %s",
+ backend_exec, backend_options, extra_options);
+ appendPQExpBuffer(&cmd, " -m %llu", (unsigned long long) start_mxid);
+ appendPQExpBuffer(&cmd, " -o %llu", (unsigned long long) start_mxoff);
+ appendPQExpBuffer(&cmd, " -x %llu", (unsigned long long) start_xid);
+ appendPQExpBuffer(&cmd, " template1 >%s", DEVNULL);
PG_CMD_OPEN(cmd.data);
@@ -3183,6 +3218,9 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+ {"xid", required_argument, NULL, 'x'},
+ {"multixact-id", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'o'},
{NULL, 0, NULL, 0}
};
@@ -3224,7 +3262,7 @@ main(int argc, char *argv[])
/* process command-line options */
- while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:",
+ while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:m:nNo:sST:U:Wx:X:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -3282,6 +3320,30 @@ main(int argc, char *argv[])
debug = true;
printf(_("Running in debug mode.\n"));
break;
+ case 'm':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactIdIsValid(start_mxid))
+ {
+ pg_log_error("invalid initial database cluster multixact id");
+ exit(1);
+ }
+ else if (start_mxid < 1) /* FirstMultiXactId */
+ {
+ /*
+ * We avoid mxid to be silently set to
+ * FirstMultiXactId, though it does not harm.
+ */
+ pg_log_error("multixact id should be greater than 0");
+ exit(1);
+ }
+ }
+ break;
case 'n':
noclean = true;
printf(_("Running in no-clean mode. Mistakes will not be cleaned up.\n"));
@@ -3289,6 +3351,21 @@ main(int argc, char *argv[])
case 'N':
do_sync = false;
break;
+ case 'o':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_mxoff = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartMultiXactOffsetIsValid(start_mxoff))
+ {
+ pg_log_error("invalid initial database cluster multixact offset");
+ exit(1);
+ }
+ }
+ break;
case 'S':
sync_only = true;
break;
@@ -3377,6 +3454,30 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+ case 'x':
+ {
+ char *endptr;
+
+ errno = 0;
+ start_xid = strtou64(optarg, &endptr, 0);
+
+ if (endptr == optarg || *endptr != '\0' || errno != 0 ||
+ !StartTransactionIdIsValid(start_xid))
+ {
+ pg_log_error("invalid value for initial database cluster xid");
+ exit(1);
+ }
+ else if (start_xid < 3) /* FirstNormalTransactionId */
+ {
+ /*
+ * We avoid xid to be silently set to
+ * FirstNormalTransactionId, though it does not harm.
+ */
+ pg_log_error("xid should be greater than 2");
+ exit(1);
+ }
+ }
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7520d3d0dd..91a85d9f4d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -282,4 +282,64 @@ command_fails(
[ 'pg_checksums', '-D', $datadir_nochecksums ],
"pg_checksums fails with data checksum disabled");
+# Set non-standard initial mxid/mxoff/xid.
+command_fails_like(
+ [ 'initdb', '-m', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact id/,
+ 'fails for invalid initial database cluster multixact id');
+command_fails_like(
+ [ 'initdb', '-o', 'seven', $datadir ],
+ qr/initdb: error: invalid initial database cluster multixact offset/,
+ 'fails for invalid initial database cluster multixact offset');
+command_fails_like(
+ [ 'initdb', '-x', 'seven', $datadir ],
+ qr/initdb: error: invalid value for initial database cluster xid/,
+ 'fails for invalid initial database cluster xid');
+
+command_checks_all(
+ [ 'initdb', '-m', '65535', "$tempdir/data-m65535" ],
+ 0,
+ [qr/selecting initial multixact id ... 65535/],
+ [],
+ 'selecting initial multixact id');
+command_checks_all(
+ [ 'initdb', '-o', '65535', "$tempdir/data-o65535" ],
+ 0,
+ [qr/selecting initial multixact offset ... 65535/],
+ [],
+ 'selecting initial multixact offset');
+command_checks_all(
+ [ 'initdb', '-x', '65535', "$tempdir/data-x65535" ],
+ 0,
+ [qr/selecting initial xid ... 65535/],
+ [],
+ 'selecting initial xid');
+
+# Setup new cluster with given mxid/mxoff/xid.
+my $node;
+my $result;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxid');
+$node->init(extra => ['-m', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multixact_id FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxid');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-mxoff');
+$node->init(extra => ['-o', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT next_multi_offset FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given mxoff');
+$node->stop;
+
+$node = PostgreSQL::Test::Cluster->new('test-xid');
+$node->init(extra => ['-x', '16777215']); # 0xFFFFFF
+$node->start;
+$result = $node->safe_psql('postgres', "SELECT txid_current();");
+ok($result >= 16777215, 'setup cluster with given xid - check 1');
+$result = $node->safe_psql('postgres', "SELECT oldest_xid FROM pg_control_checkpoint();");
+ok($result >= 16777215, 'setup cluster with given xid - check 2');
+$node->stop;
+
done_testing();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34ad46c067..4ce79b12e3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -94,6 +94,9 @@ typedef enum RecoveryState
} RecoveryState;
extern PGDLLIMPORT int wal_level;
+extern PGDLLIMPORT TransactionId start_xid;
+extern PGDLLIMPORT MultiXactId start_mxid;
+extern PGDLLIMPORT MultiXactOffset start_mxoff;
/* Is WAL archiving enabled (always or only while server is running normally)? */
#define XLogArchivingActive() \
diff --git a/src/include/c.h b/src/include/c.h
index e1b3187d0b..f770e9a140 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -668,6 +668,10 @@ typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
+#define StartTransactionIdIsValid(xid) ((xid) <= 0xFFFFFFFF)
+#define StartMultiXactIdIsValid(mxid) ((mxid) <= 0xFFFFFFFF)
+#define StartMultiXactOffsetIsValid(offset) ((offset) <= 0xFFFFFFFF)
+
#define FirstCommandId ((CommandId) 0)
#define InvalidCommandId (~(CommandId)0)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 0fc2c093b0..0a7518df0d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -123,7 +123,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
Oid relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
/* all Xids < this are frozen in this rel */
- TransactionId relfrozenxid BKI_DEFAULT(3); /* FirstNormalTransactionId */
+ TransactionId relfrozenxid BKI_DEFAULT(RECENTXMIN); /* FirstNormalTransactionId */
/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid BKI_DEFAULT(1); /* FirstMultiXactId */
--
2.43.0
[application/octet-stream] v9-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch (8.8K, 4-v9-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch)
download | inline diff:
From d703fe4538754534817596a0d4f51e06a8c3293f Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 23 Oct 2024 18:23:39 +0300
Subject: [PATCH v9 4/7] Get rid of MultiXactMemberFreezeThreshold call.
Since MaxMultiXactOffset are UINT64_MAX now, MULTIXACT_MEMBER_SAFE_THRESHOLD and
MULTIXACT_MEMBER_DANGER_THRESHOLD values are not meaningful any more. Thus,
MultiXactMemberFreezeThreshold is not needed too.
Instead, switch to MULTIXACT_MEMBER_AUTOVAC_THRESHOLD (eq 2^32) members
threshold. It is used to determine if we need to force autovacuum or not.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 117 +++----------------------
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/include/access/multixact.h | 1 -
4 files changed, 15 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 48e1c0160a..a817f539ee 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -204,10 +204,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuumin order to release the disk space, reduce table bloat if
+ * possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -2616,15 +2620,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2712,10 +2714,10 @@ SetOffsetVacuumLimit(bool is_startup)
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2761,101 +2763,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return true;
}
-/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
- */
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
-{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
-
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
-
- if (!oldestOffsetKnown)
- return false;
-
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
-}
-
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
-
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 86f36b3695..e7506e268a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,7 +1133,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..180bb7e96e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1122,7 +1122,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1915,7 +1915,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 90583634ec..5aefbddce3 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -143,7 +143,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(TransactionId xid, uint16 info,
void *recdata, uint32 len);
--
2.43.0
[application/octet-stream] v9-0002-Use-64-bit-multixact-offsets.patch (12.9K, 5-v9-0002-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 8cc5477a23b383132fddd4386492c0ffe6b63fb7 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 6 Mar 2024 11:11:33 +0300
Subject: [PATCH v9 2/7] Use 64-bit multixact offsets.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/transam/multixact.c | 170 +------------------------
src/bin/pg_resetwal/pg_resetwal.c | 2 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 2 +-
src/include/c.h | 2 +-
5 files changed, 10 insertions(+), 168 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index ab90912ed3..48e1c0160a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -96,14 +96,6 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
/* We need four bytes per offset */
@@ -272,9 +264,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -409,8 +398,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMZeroPageXlogRec(int64 pageno, uint8 info);
@@ -1164,78 +1151,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -2721,8 +2636,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2737,7 +2650,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2768,11 +2680,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2785,24 +2693,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2812,14 +2703,12 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
@@ -2829,54 +2718,6 @@ SetOffsetVacuumLimit(bool is_startup)
(nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
}
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
-}
-
/*
* Find the starting offset of the given MultiXactId.
*
@@ -2998,8 +2839,9 @@ MultiXactMemberFreezeThreshold(void)
* we try to eliminate from the system is based on how far we are past
* MULTIXACT_MEMBER_SAFE_THRESHOLD.
*/
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+ fraction /= (double) (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
+
victim_multixacts = multixacts * fraction;
/* fraction could be > 1.0, but lowest possible freeze age is zero */
@@ -3345,7 +3187,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 985cd06802..1af2ce4b93 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 9829e48106..f8a8eef44d 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -206,7 +206,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# -m argument is "new,old"
push @cmd, '-m',
sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 7ffd256c74..90583634ec 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -27,7 +27,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
diff --git a/src/include/c.h b/src/include/c.h
index 0a548d69d7..e1b3187d0b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -664,7 +664,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
[application/octet-stream] v9-0001-Use-64-bit-format-output-for-multixact-offsets.patch (9.0K, 6-v9-0001-Use-64-bit-format-output-for-multixact-offsets.patch)
download | inline diff:
From bc77e08c2afae2d0e4ae9222dfff1a77ef2b3f18 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v9 1/7] Use 64-bit format output for multixact offsets
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 9 ++++----
src/backend/access/rmgrdesc/xlogdesc.c | 4 ++--
src/backend/access/transam/multixact.c | 26 +++++++++++++----------
src/backend/access/transam/xlogrecovery.c | 5 +++--
src/bin/pg_controldata/pg_controldata.c | 4 ++--
src/bin/pg_resetwal/pg_resetwal.c | 8 +++----
6 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3e8ad4d5ef..1b486de38c 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,8 +65,8 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
- xlrec->moff, xlrec->nmembers);
+ appendStringInfo(buf, "%u offset %llu nmembers %d: ", xlrec->mid,
+ (unsigned long long) xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
}
@@ -74,9 +74,10 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%llu, %llu)",
xlrec->startTruncOff, xlrec->endTruncOff,
- xlrec->startTruncMemb, xlrec->endTruncMemb);
+ (unsigned long long) xlrec->startTruncMemb,
+ (unsigned long long) xlrec->endTruncMemb);
}
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 363294d623..aaa19c81c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %llu; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
@@ -79,7 +79,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
XidFromFullTransactionId(checkpoint->nextXid),
checkpoint->nextOid,
checkpoint->nextMulti,
- checkpoint->nextMultiOffset,
+ (unsigned long long) checkpoint->nextMultiOffset,
checkpoint->oldestXid,
checkpoint->oldestXidDB,
checkpoint->oldestMulti,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8c37d7eba7..ab90912ed3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1264,7 +1264,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %llu", result,
+ (unsigned long long) *offset);
return result;
}
@@ -2293,8 +2294,9 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
- *nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %llu, oldestMulti %u in DB %u",
+ *nextMulti, (unsigned long long) *nextMultiOffset, *oldestMulti,
+ *oldestMultiDB);
}
/*
@@ -2328,8 +2330,8 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
- nextMulti, nextMultiOffset);
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %llu",
+ nextMulti, (unsigned long long) nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
MultiXactState->nextOffset = nextMultiOffset;
@@ -2519,8 +2521,8 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
- minMultiOffset);
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %llu",
+ (unsigned long long) minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
LWLockRelease(MultiXactGenLock);
@@ -3211,11 +3213,12 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
oldestMulti, newOldestMulti,
(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
(unsigned long long) MultiXactIdToOffsetSegment(newOldestMulti),
- oldestOffset, newOldestOffset,
+ (unsigned long long) oldestOffset,
+ (unsigned long long) newOldestOffset,
(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
@@ -3471,11 +3474,12 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%llx, %llx), "
- "members [%u, %u), members segments [%llx, %llx)",
+ "members [%llu, %llu), members segments [%llx, %llx)",
xlrec.startTruncOff, xlrec.endTruncOff,
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.startTruncOff),
(unsigned long long) MultiXactIdToOffsetSegment(xlrec.endTruncOff),
- xlrec.startTruncMemb, xlrec.endTruncMemb,
+ (unsigned long long) xlrec.startTruncMemb,
+ (unsigned long long) xlrec.endTruncMemb,
(unsigned long long) MXOffsetToMemberSegment(xlrec.startTruncMemb),
(unsigned long long) MXOffsetToMemberSegment(xlrec.endTruncMemb));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 05c738d661..727b6e744f 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -876,8 +876,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
- checkPoint.nextMulti, checkPoint.nextMultiOffset)));
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %llu",
+ checkPoint.nextMulti,
+ (unsigned long long) checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
checkPoint.oldestXid, checkPoint.oldestXidDB)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..43b6727570 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -253,8 +253,8 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile->checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..985cd06802 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -737,8 +737,8 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("Latest checkpoint's NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
printf(_("Latest checkpoint's oldestXID's DB: %u\n"),
@@ -809,8 +809,8 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
- ControlFile.checkPointCopy.nextMultiOffset);
+ printf(_("NextMultiOffset: %llu\n"),
+ (unsigned long long) ControlFile.checkPointCopy.nextMultiOffset);
}
if (set_oid != 0)
--
2.43.0
[application/octet-stream] v9-0003-Make-pg_upgrade-convert-multixact-offsets.patch (17.9K, 7-v9-0003-Make-pg_upgrade-convert-multixact-offsets.patch)
download | inline diff:
From d731c49b8c51d57ee4ae0160a4668f9f99d4a2bc Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 13 Aug 2024 14:44:50 +0300
Subject: [PATCH v9 3/7] Make pg_upgrade convert multixact offsets.
Author: Maxim Orlov <[email protected]>
Author: Yura Sokolov <[email protected]>
---
src/bin/pg_upgrade/Makefile | 1 +
src/bin/pg_upgrade/meson.build | 1 +
src/bin/pg_upgrade/pg_upgrade.c | 42 ++-
src/bin/pg_upgrade/pg_upgrade.h | 14 +-
src/bin/pg_upgrade/segresize.c | 527 ++++++++++++++++++++++++++++++++
5 files changed, 580 insertions(+), 5 deletions(-)
create mode 100644 src/bin/pg_upgrade/segresize.c
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d30..70908d63a3 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -21,6 +21,7 @@ OBJS = \
info.o \
option.o \
parallel.o \
+ segresize.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 3d88419674..16f898ba14 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -10,6 +10,7 @@ pg_upgrade_sources = files(
'info.c',
'option.c',
'parallel.c',
+ 'segresize.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..1654e877c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -750,8 +750,42 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
- copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ MultiXactOffset oldest_offset,
+ next_offset;
+
+ remove_new_subdir("pg_multixact/offsets", false);
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+ oldest_offset = convert_multixact_offsets();
+ check_ok();
+
+ remove_new_subdir("pg_multixact/members", false);
+ prep_status("Converting pg_multixact/members");
+ convert_multixact_members(oldest_offset);
+ check_ok();
+
+ next_offset = old_cluster.controldata.chkpnt_nxtmxoff;
+ if (oldest_offset)
+ {
+ if (next_offset < oldest_offset)
+ next_offset += ((MultiXactOffset) 1 << 32) - 1;
+
+ next_offset -= oldest_offset - 1;
+
+ old_cluster.controldata.chkpnt_nxtmxoff = next_offset;
+ }
+ }
+ else
+ {
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ }
prep_status("Setting next multixact ID and offset for new cluster");
@@ -760,9 +794,9 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
+ "\"%s/pg_resetwal\" -O %llu -m %u,%u \"%s\"",
new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
+ (unsigned long long) old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..2c85ec1e94 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,13 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ *
+ * XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
@@ -230,7 +237,7 @@ typedef struct
uint32 chkpnt_nxtepoch;
uint32 chkpnt_nxtoid;
uint32 chkpnt_nxtmulti;
- uint32 chkpnt_nxtmxoff;
+ uint64 chkpnt_nxtmxoff;
uint32 chkpnt_oldstMulti;
uint32 chkpnt_oldstxid;
uint32 align;
@@ -515,3 +522,8 @@ typedef struct
FILE *file;
char path[MAXPGPATH];
} UpgradeTaskReport;
+
+/* segresize.c */
+
+MultiXactOffset convert_multixact_offsets(void);
+void convert_multixact_members(MultiXactOffset oldest_offset);
diff --git a/src/bin/pg_upgrade/segresize.c b/src/bin/pg_upgrade/segresize.c
new file mode 100644
index 0000000000..73064c77de
--- /dev/null
+++ b/src/bin/pg_upgrade/segresize.c
@@ -0,0 +1,527 @@
+/*
+ * segresize.c
+ *
+ * SLRU segment resize utility
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/segresize.c
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_upgrade.h"
+#include "access/multixact.h"
+
+/* See slru.h */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+typedef struct SlruSegState
+{
+ char *dir;
+ char *fn;
+ FILE *file;
+ int64 segno;
+ uint64 pageno;
+ bool leading_gap;
+} SlruSegState;
+
+/*
+ * Mirrors the SlruFileName from slru.c
+ */
+static inline char *
+SlruFileName(SlruSegState *state)
+{
+ Assert(state->segno >= 0 && state->segno <= INT64CONST(0xFFFFFF));
+ return psprintf("%s/%04X", state->dir, (unsigned int) state->segno);
+}
+
+/*
+ * Create new SLRU segment file.
+ */
+static void
+create_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "wb");
+ if (!state->file)
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+}
+
+/*
+ * Open existing SLRU segment file.
+ */
+static void
+open_segment(SlruSegState *state)
+{
+ Assert(state->fn == NULL);
+ Assert(state->file == NULL);
+
+ state->fn = SlruFileName(state);
+ state->file = fopen(state->fn, "rb");
+ if (!state->file)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+}
+
+/*
+ * Close SLRU segment file.
+ */
+static void
+close_segment(SlruSegState *state)
+{
+ if (state->file)
+ {
+ fclose(state->file);
+ state->file = NULL;
+ }
+
+ if (state->fn)
+ {
+ pfree(state->fn);
+ state->fn = NULL;
+ }
+}
+
+/*
+ * Read next page from the old 32-bit offset segment file.
+ */
+static int
+read_old_segment_page(SlruSegState *state, void *buf, bool *empty)
+{
+ int len;
+
+ /* Open next segment file, if needed. */
+ if (!state->fn)
+ {
+ if (!state->segno)
+ state->leading_gap = true;
+
+ open_segment(state);
+
+ /* Set position to the needed page. */
+ if (state->pageno > 0 &&
+ fseek(state->file, state->pageno * BLCKSZ, SEEK_SET))
+ {
+ close_segment(state);
+ }
+ }
+
+ if (state->file)
+ {
+ /* Segment file do exists, read page from it. */
+ state->leading_gap = false;
+
+ len = fread(buf, sizeof(char), BLCKSZ, state->file);
+
+ /* Are we done or was there an error? */
+ if (len <= 0)
+ {
+ if (ferror(state->file))
+ pg_fatal("error reading file \"%s\": %m", state->fn);
+
+ if (feof(state->file))
+ {
+ *empty = true;
+ len = -1;
+
+ close_segment(state);
+ }
+ }
+ else
+ *empty = false;
+ }
+ else if (!state->leading_gap)
+ {
+ /* We reached the last segment. */
+ len = -1;
+ *empty = true;
+ }
+ else
+ {
+ /* Skip few first segments if they were frozen and removed. */
+ len = BLCKSZ;
+ *empty = true;
+ }
+
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+
+ return len;
+}
+
+/*
+ * Write next page to the new 64-bit offset segment file.
+ */
+static void
+write_new_segment_page(SlruSegState *state, void *buf)
+{
+ /*
+ * Create a new segment file if we still didn't. Creation is
+ * postponed until the first non-empty page is found. This helps
+ * not to create completely empty segments.
+ */
+ if (!state->file)
+ {
+ create_segment(state);
+
+ /* Write zeroes to the previously skipped prefix. */
+ if (state->pageno > 0)
+ {
+ char zerobuf[BLCKSZ] = {0};
+
+ for (int64 i = 0; i < state->pageno; i++)
+ {
+ if (fwrite(zerobuf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+ }
+ }
+
+ /* Write page to the new segment (if it was created). */
+ if (state->file)
+ {
+ if (fwrite(buf, sizeof(char), BLCKSZ, state->file) != BLCKSZ)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ /*
+ * Did we reach the maximum page number? Then close segment file
+ * and create a new one on the next iteration.
+ */
+ if (++state->pageno >= SLRU_PAGES_PER_SEGMENT)
+ {
+ /* Start a new segment. */
+ state->segno++;
+ state->pageno = 0;
+
+ close_segment(state);
+ }
+}
+
+typedef uint32 MultiXactOffsetOld;
+
+#define MaxMultiXactOffsetOld ((MultiXactOffsetOld) 0xFFFFFFFF)
+
+#define MULTIXACT_OFFSETS_PER_PAGE_OLD (BLCKSZ / sizeof(MultiXactOffsetOld))
+#define MULTIXACT_OFFSETS_PER_PAGE_NEW (BLCKSZ / sizeof(MultiXactOffset))
+
+/*
+ * Convert pg_multixact/offsets segments and return oldest multi offset.
+ */
+MultiXactOffset
+convert_multixact_offsets(void)
+{
+ SlruSegState oldseg = {0},
+ newseg = {0};
+ MultiXactOffsetOld oldbuf[MULTIXACT_OFFSETS_PER_PAGE_OLD] = {0};
+ MultiXactOffset newbuf[MULTIXACT_OFFSETS_PER_PAGE_NEW] = {0},
+ oldest_offset = 0;
+ uint64 oldest_multi = old_cluster.controldata.chkpnt_oldstMulti,
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti,
+ multi,
+ old_entry,
+ new_entry;
+ bool oldest_offset_known = false;
+
+ oldseg.dir = psprintf("%s/pg_multixact/offsets", old_cluster.pgdata);
+ newseg.dir = psprintf("%s/pg_multixact/offsets", new_cluster.pgdata);
+
+ old_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ new_entry = oldest_multi % MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.pageno = oldest_multi / MULTIXACT_OFFSETS_PER_PAGE_NEW;
+ newseg.segno = newseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ newseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ if (next_multi < oldest_multi)
+ next_multi += (uint64) 1 << 32; /* wraparound */
+
+ /* Copy multi offsets reading only needed segment pages */
+ for (multi = oldest_multi; multi < next_multi; old_entry = 0)
+ {
+ int oldlen;
+ bool empty;
+
+ /* Handle possible segment wraparound */
+#define OLD_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_OLD / SLRU_PAGES_PER_SEGMENT)
+ if (oldseg.segno > OLD_OFFSET_SEGNO_MAX)
+ {
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+ }
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Save oldest multi offset */
+ if (!oldest_offset_known)
+ {
+ oldest_offset = oldbuf[old_entry];
+ oldest_offset_known = true;
+ }
+
+ /* Skip wrapped-around invalid MultiXactIds */
+ if (multi == (uint64) 1 << 32)
+ {
+ Assert(oldseg.segno == 0);
+ Assert(oldseg.pageno == 1);
+ Assert(old_entry == 0);
+ Assert(new_entry == 0);
+
+ multi += FirstMultiXactId;
+ old_entry = FirstMultiXactId;
+ new_entry = FirstMultiXactId;
+ }
+
+ /* Copy entries to the new page */
+ for (; multi < next_multi && old_entry < MULTIXACT_OFFSETS_PER_PAGE_OLD;
+ multi++, old_entry++)
+ {
+ MultiXactOffset offset = oldbuf[old_entry];
+
+ /* Handle possible offset wraparound (1 becomes 2^32) */
+ if (offset < oldest_offset)
+ offset += ((uint64) 1 << 32) - 1;
+
+ /* Subtract oldest_offset, so new offsets will start from 1 */
+ newbuf[new_entry++] = offset - oldest_offset + 1;
+
+ if (new_entry >= MULTIXACT_OFFSETS_PER_PAGE_NEW)
+ {
+ /* Handle possible segment wraparound */
+#define NEW_OFFSET_SEGNO_MAX \
+ (MaxMultiXactId / MULTIXACT_OFFSETS_PER_PAGE_NEW / SLRU_PAGES_PER_SEGMENT)
+ if (newseg.segno > NEW_OFFSET_SEGNO_MAX)
+ {
+ newseg.segno = 0;
+ newseg.pageno = 0;
+ }
+
+ /* Write new page */
+ write_new_segment_page(&newseg, newbuf);
+ new_entry = 0;
+ }
+ }
+ }
+
+ /* Write the last incomplete page */
+ if (new_entry > 0 || oldest_multi == next_multi)
+ {
+ memset(&newbuf[new_entry], 0,
+ sizeof(newbuf[0]) * (MULTIXACT_OFFSETS_PER_PAGE_NEW - new_entry));
+ write_new_segment_page(&newseg, newbuf);
+ }
+
+ /* Use next_offset as oldest_offset, if oldest_multi == next_multi */
+ if (!oldest_offset_known)
+ {
+ Assert(oldest_multi == next_multi);
+ oldest_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ }
+
+ /* Release resources */
+ close_segment(&oldseg);
+ close_segment(&newseg);
+
+ pfree(oldseg.dir);
+ pfree(newseg.dir);
+
+ return oldest_offset;
+}
+
+#define MXACT_MEMBERS_FLAG_BYTES 1
+
+#define MULTIXACT_MEMBERS_PER_GROUP 4
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (MULTIXACT_MEMBERS_PER_GROUP * (sizeof(TransactionId) + MXACT_MEMBERS_FLAG_BYTES))
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE \
+ (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERS_PER_GROUP * MULTIXACT_MEMBERGROUPS_PER_PAGE)
+#define MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP \
+ (MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP)
+
+typedef struct MultiXactMembersCtx
+{
+ SlruSegState seg;
+ char buf[BLCKSZ];
+ int group;
+ int member;
+ char *flag;
+ TransactionId *xid;
+} MultiXactMembersCtx;
+
+static void
+MultiXactMembersCtxInit(MultiXactMembersCtx *ctx)
+{
+ ctx->seg.dir = psprintf("%s/pg_multixact/members", new_cluster.pgdata);
+
+ ctx->group = 0;
+ ctx->member = 1; /* skip invalid zero offset */
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+
+ ctx->flag += ctx->member;
+ ctx->xid += ctx->member;
+}
+
+static void
+MultiXactMembersCtxAdd(MultiXactMembersCtx *ctx, char flag, TransactionId xid)
+{
+ /* Copy member's xid and flags to the new page */
+ *ctx->flag++ = flag;
+ *ctx->xid++ = xid;
+
+ if (++ctx->member < MULTIXACT_MEMBERS_PER_GROUP)
+ return;
+
+ /* Start next member group */
+ ctx->member = 0;
+
+ if (++ctx->group >= MULTIXACT_MEMBERGROUPS_PER_PAGE)
+ {
+ /* Write current page and start new */
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ ctx->group = 0;
+ memset(ctx->buf, 0, BLCKSZ);
+ }
+
+ ctx->flag = (char *) ctx->buf + ctx->group * MULTIXACT_MEMBERGROUP_SIZE;
+ ctx->xid = (TransactionId *)(ctx->flag + MXACT_MEMBERS_FLAG_BYTES * MULTIXACT_MEMBERS_PER_GROUP);
+}
+
+static void
+MultiXactMembersCtxFinit(MultiXactMembersCtx *ctx)
+{
+ if (ctx->flag > (char *) ctx->buf)
+ write_new_segment_page(&ctx->seg, ctx->buf);
+
+ close_segment(&ctx->seg);
+
+ pfree(ctx->seg.dir);
+}
+
+/*
+ * Convert pg_multixact/members segments, offsets will start from 1.
+ *
+ */
+void
+convert_multixact_members(MultiXactOffset oldest_offset)
+{
+ MultiXactOffset next_offset,
+ offset;
+ SlruSegState oldseg = {0};
+ char oldbuf[BLCKSZ] = {0};
+ int oldidx;
+ MultiXactMembersCtx newctx = {0};
+
+ oldseg.dir = psprintf("%s/pg_multixact/members", old_cluster.pgdata);
+
+ next_offset = (MultiXactOffset) old_cluster.controldata.chkpnt_nxtmxoff;
+ if (next_offset < oldest_offset)
+ next_offset += ((uint64) 1 << 32) - 1;
+
+ /* Initialize the old starting position */
+ oldseg.pageno = oldest_offset / MULTIXACT_MEMBERS_PER_PAGE;
+ oldseg.segno = oldseg.pageno / SLRU_PAGES_PER_SEGMENT;
+ oldseg.pageno %= SLRU_PAGES_PER_SEGMENT;
+
+ /* Initialize new starting position */
+ MultiXactMembersCtxInit(&newctx);
+
+ /* Iterate through the original directory */
+ oldidx = oldest_offset % MULTIXACT_MEMBERS_PER_PAGE;
+ for (offset = oldest_offset; offset < next_offset;)
+ {
+ bool empty;
+ int oldlen;
+ int ngroups;
+ int oldgroup;
+ int oldmember;
+
+ oldlen = read_old_segment_page(&oldseg, oldbuf, &empty);
+ if (empty || oldlen != BLCKSZ)
+ pg_fatal("cannot read page %llu from file \"%s\": %m",
+ (unsigned long long) oldseg.pageno, oldseg.fn);
+
+ /* Iterate through the old member groups */
+ ngroups = oldlen / MULTIXACT_MEMBERGROUP_SIZE;
+ oldmember = oldidx % MULTIXACT_MEMBERS_PER_GROUP;
+ oldgroup = oldidx / MULTIXACT_MEMBERS_PER_GROUP;
+ while (oldgroup < ngroups && offset < next_offset)
+ {
+ char *oldflag;
+ TransactionId *oldxid;
+ int i;
+
+ oldflag = (char *) oldbuf + oldgroup * MULTIXACT_MEMBERGROUP_SIZE;
+ oldxid = (TransactionId *)(oldflag + MULTIXACT_MEMBER_FLAG_BYTES_PER_GROUP);
+
+ oldxid += oldmember;
+ oldflag += oldmember;
+
+ /* Iterate through the old members */
+ for (i = oldmember;
+ i < MULTIXACT_MEMBERS_PER_GROUP && offset < next_offset;
+ i++)
+ {
+ MultiXactMembersCtxAdd(&newctx, *oldflag++, *oldxid++);
+
+ if (++offset == (uint64) 1 << 32)
+ {
+ Assert(i == MaxMultiXactOffsetOld % MULTIXACT_MEMBERS_PER_GROUP);
+ goto wraparound;
+ }
+ }
+
+ oldgroup++;
+ oldmember = 0;
+ }
+
+ oldidx = 0;
+
+ continue;
+
+wraparound:
+#define SEGNO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE / SLRU_PAGES_PER_SEGMENT
+#define PAGENO_MAX MaxMultiXactOffsetOld / MULTIXACT_MEMBERS_PER_PAGE % SLRU_PAGES_PER_SEGMENT
+ Assert((oldseg.segno == SEGNO_MAX && oldseg.pageno == PAGENO_MAX + 1) ||
+ (oldseg.segno == SEGNO_MAX + 1 && oldseg.pageno == 0));
+
+ /* Switch to segment 0000 */
+ close_segment(&oldseg);
+ oldseg.segno = 0;
+ oldseg.pageno = 0;
+
+ /* skip invalid zero multi offset */
+ oldidx = 1;
+ }
+
+ MultiXactMembersCtxFinit(&newctx);
+
+ /* Release resources */
+ close_segment(&oldseg);
+
+ pfree(oldseg.dir);
+}
--
2.43.0
[text/plain] v9-0007-TEST-bump-catver.patch.txt (1.1K, 8-v9-0007-TEST-bump-catver.patch.txt)
download | inline diff:
From 33e21cf86b1813a67c699d703ab1f75bcf28a7b1 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 13 Nov 2024 16:34:34 +0300
Subject: [PATCH v9 7/7] TEST: bump catver
---
src/bin/pg_upgrade/pg_upgrade.h | 2 +-
src/include/catalog/catversion.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2c85ec1e94..18faedc963 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -119,7 +119,7 @@ extern char *output_files[];
*
* XXX: should be changed to the actual CATALOG_VERSION_NO on commit.
*/
-#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202409041
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 202411112
/*
* large object chunk size added to pg_controldata,
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5dd91e190a..3d09caf5ae 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202411111
+#define CATALOG_VERSION_NO 202411112
#endif
--
2.43.0
[text/plain] v9-0006-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch.txt (13.5K, 9-v9-0006-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch.txt)
download | inline diff:
From 3558ccb4712d50bcda877474db5c9fd124b6e919 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 19 Nov 2024 17:08:10 +0300
Subject: [PATCH v9 6/7] TEST: add src/bin/pg_upgrade/t/005_offset.pl
---
src/bin/pg_upgrade/t/005_offset.pl | 562 +++++++++++++++++++++++++++++
1 file changed, 562 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/005_offset.pl
diff --git a/src/bin/pg_upgrade/t/005_offset.pl b/src/bin/pg_upgrade/t/005_offset.pl
new file mode 100644
index 0000000000..1cfd8b364a
--- /dev/null
+++ b/src/bin/pg_upgrade/t/005_offset.pl
@@ -0,0 +1,562 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# This pair of calls will create significantly more member segments than offset
+# segments.
+sub prep
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+}
+
+sub fill
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ my $nclients = 50;
+ my $update_every = 90;
+ my @connections = ();
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres');
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ for (my $i = 0; $i < 20000; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} FOR KEY SHARE");
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+}
+
+# This pair of calls will create more or less the same amount of membsers and
+# offsets segments.
+sub prep2
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl}(BAR INT PRIMARY KEY, BAZ INT); " .
+ "CREATE OR REPLACE PROCEDURE MXIDFILLER(N_STEPS INT DEFAULT 1000) " .
+ "LANGUAGE PLPGSQL " .
+ "AS \$\$ " .
+ "BEGIN " .
+ " FOR I IN 1..N_STEPS LOOP " .
+ " UPDATE ${tbl} SET BAZ = RANDOM(1, 1000) " .
+ " WHERE BAR IN (SELECT BAR FROM ${tbl} " .
+ " TABLESAMPLE BERNOULLI(80)); " .
+ " COMMIT; " .
+ " END LOOP; " .
+ "END; \$\$; " .
+ "INSERT INTO ${tbl} (BAR, BAZ) " .
+ "SELECT ID, ID FROM GENERATE_SERIES(1, 1024) ID;");
+}
+
+sub fill2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ $node->safe_psql('postgres',
+ "BEGIN; " .
+ "SELECT * FROM ${tbl} FOR KEY SHARE; " .
+ "PREPARE TRANSACTION 'A'; " .
+ "CALL MXIDFILLER((365 * ${scale})::int); " .
+ "COMMIT PREPARED 'A';");
+}
+
+
+# generate around 2 offset segments and 55 member segments
+sub mxid_gen1
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ prep($node, $tbl);
+ fill($node, $tbl);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# generate around 10 offset segments and 12 member segments
+sub mxid_gen2
+{
+ my $node = shift;
+ my $tbl = shift;
+ my $scale = shift // 1;
+
+ prep2($node, $tbl);
+ fill2($node, $tbl, $scale);
+
+ $node->safe_psql('postgres', q(CHECKPOINT));
+}
+
+# Fetch latest multixact checkpoint values.
+sub multi_bounds
+{
+ my ($node) = @_;
+ my $path = $node->config_data('--bindir');
+ my ($stdout, $stderr) = run_command([
+ $path . '/pg_controldata',
+ $node->data_dir
+ ]);
+ my @control_data = split("\n", $stdout);
+ my $next = undef;
+ my $oldest = undef;
+ my $next_offset = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiXactId:\s*(.*)$/mg)
+ {
+ $next = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's oldestMultiXid:\s*(.*)$/mg)
+ {
+ $oldest = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+ {
+ $next_offset = $1;
+ print ">>> @ node ". $node->name . ", " . $_ . "\n";
+ }
+
+ if (defined($oldest) && defined($next) && defined($next_offset))
+ {
+ last;
+ }
+ }
+
+ die "Latest checkpoint's NextMultiXactId not found in control file!\n"
+ unless defined($next);
+
+ die "Latest checkpoint's oldestMultiXid not found in control file!\n"
+ unless defined($oldest);
+
+ die "Latest checkpoint's NextMultiOffset not found in control file!\n"
+ unless defined($next_offset);
+
+ return ($oldest, $next, $next_offset);
+}
+
+# Create node from existing bins.
+sub create_new_node
+{
+ my ($name, %params) = @_;
+
+ create_node(0, @_);
+}
+
+# Create node from ENV oldinstall
+sub create_old_node
+{
+ my ($name, %params) = @_;
+
+ if (!defined($ENV{oldinstall}))
+ {
+ die "oldinstall is not defined";
+ }
+
+ create_node(1, @_);
+}
+
+sub create_node
+{
+ my ($install_path_from_env, $name, %params) = @_;
+ my $scale = defined $params{scale} ? $params{scale} : 1;
+ my $multi = defined $params{multi} ? $params{multi} : undef;
+ my $offset = defined $params{offset} ? $params{offset} : undef;
+
+ my $node =
+ $install_path_from_env ?
+ PostgreSQL::Test::Cluster->new($name,
+ install_path => $ENV{oldinstall}) :
+ PostgreSQL::Test::Cluster->new($name);
+
+ $node->init(force_initdb => 1,
+ extra => [
+ $multi ? ('-m', $multi) : (),
+ $offset ? ('-o', $offset) : (),
+ ]);
+
+ # Fixup MOX patch quirk
+ if ($multi)
+ {
+ unlink $node->data_dir . '/pg_multixact/offsets/0000';
+ }
+ if ($offset)
+ {
+ unlink $node->data_dir . '/pg_multixact/members/0000';
+ }
+
+ $node->append_conf('fsync', 'off');
+ $node->append_conf('postgresql.conf', 'max_prepared_transactions = 2');
+
+ $node->start();
+ mxid_gen2($node, 'FOO', $scale);
+ mxid_gen1($node, 'BAR', $scale);
+ $node->restart();
+ $node->safe_psql('postgres', q(SELECT * FROM FOO)); # just in case...
+ $node->safe_psql('postgres', q(SELECT * FROM BAR));
+ $node->safe_psql('postgres', q(CHECKPOINT));
+ $node->stop();
+
+ return $node;
+}
+
+sub do_upgrade
+{
+ my ($oldnode, $newnode) = @_;
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--check'
+ ],
+ 'run of pg_upgrade');
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $oldnode->data_dir,
+ '-D', $newnode->data_dir,
+ '-b', $oldnode->config_data('--bindir'),
+ '-B', $newnode->config_data('--bindir'),
+ '-s', $newnode->host,
+ '-p', $oldnode->port,
+ '-P', $newnode->port,
+ '--copy'
+ ],
+ 'run of pg_upgrade');
+
+ $oldnode->start();
+ $newnode->start();
+
+ my $oldfoo = $oldnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ my $newfoo = $newnode->safe_psql('postgres', q(SELECT * FROM FOO));
+ is($oldfoo, $newfoo, "select foo eq");
+
+ my $oldbar = $oldnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ my $newbar = $newnode->safe_psql('postgres', q(SELECT * FROM BAR));
+ is($oldbar, $newbar, "select bar eq");
+
+ $oldnode->stop();
+ $newnode->stop();
+
+ multi_bounds($oldnode);
+ multi_bounds($newnode);
+}
+
+my @TESTS = (
+ # tests without ENV oldinstall
+ 0, 1, 2, 3, 4, 5, 6,
+ # tests with "real" pg_upgrade
+ 100, 101, 102, 103, 104, 105, 106,
+ # self upgrade
+ 1000,
+);
+
+# =============================================================================
+# Basic sanity tests on a NEW bin
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 0;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mo',
+ scale => 1);
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 1;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo',
+ scale => 1.15,
+ multi => '0x123400');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 2;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO',
+ scale => 1.15,
+ offset => '0x432100');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 3;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO',
+ scale => 1.15,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 4;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_Mo_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 5;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_mO_wrap',
+ scale => 1.15,
+ offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 6;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $node = create_new_node('simple_MO_wrap',
+ scale => 1.15,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ multi_bounds($node);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# pg_upgarde tests
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 100;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value
+SKIP:
+{
+ my $TEST_NO = 101;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0x123400');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 102;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0x432100');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi and offsets starts from the value
+SKIP:
+{
+ my $TEST_NO = 103;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xDEAD00', offset => '0xBEEF00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, multi wrap
+SKIP:
+{
+ my $TEST_NO = 104;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'Mo_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# offsets starts from the value, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 105;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'mO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# multi starts from the value, offsets starts from the value,
+# multi wrap, offsets wrap
+SKIP:
+{
+ my $TEST_NO = 106;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'MO_wrap';
+ my $oldnode = create_old_node("old_$dbname",
+ scale => 1.2,
+ multi => '0xFFFF7000', offset => '0xFFFFFC00');
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+# =============================================================================
+# Self upgrade
+# =============================================================================
+
+# starts from the zero
+SKIP:
+{
+ my $TEST_NO = 1000;
+ skip "do not test case $TEST_NO", 1
+ unless ( grep( /^$TEST_NO$/, @TESTS ) );
+
+ my $dbname = 'self_upgrade';
+ my $oldnode = create_new_node("old_$dbname",
+ scale => 1);
+ my $newnode = PostgreSQL::Test::Cluster->new("new_$dbname");
+ $newnode->init();
+
+ do_upgrade($oldnode, $newnode);
+ ok(1, "TEST $TEST_NO PASSED");
+}
+
+done_testing();
--
2.43.0
^ permalink raw reply [nested|flat] 21+ messages in thread
end of thread, other threads:[~2024-11-19 17:53 UTC | newest]
Thread overview: 21+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25 14:20 Re: POC: make mxidoff 64 bits Maxim Orlov <[email protected]>
2024-08-14 15:30 ` Maxim Orlov <[email protected]>
2024-09-03 13:30 ` Maxim Orlov <[email protected]>
2024-09-03 13:32 ` Alexander Korotkov <[email protected]>
2024-09-04 08:49 ` Maxim Orlov <[email protected]>
2024-09-07 04:36 ` Maxim Orlov <[email protected]>
2024-09-12 12:09 ` Pavel Borisov <[email protected]>
2024-09-12 12:25 ` Pavel Borisov <[email protected]>
2024-09-12 13:14 ` Alvaro Herrera <[email protected]>
2024-10-22 09:43 ` Heikki Linnakangas <[email protected]>
2024-10-22 16:33 ` Maxim Orlov <[email protected]>
2024-10-23 15:55 ` Maxim Orlov <[email protected]>
2024-10-25 03:38 ` wenhui qiu <[email protected]>
2024-11-08 18:10 ` Maxim Orlov <[email protected]>
2024-11-11 23:31 ` Heikki Linnakangas <[email protected]>
2024-11-13 15:44 ` Maxim Orlov <[email protected]>
2024-11-15 08:41 ` Maxim Orlov <[email protected]>
2024-11-15 11:06 ` Heikki Linnakangas <[email protected]>
2024-11-15 16:19 ` Maxim Orlov <[email protected]>
2024-11-18 13:22 ` Maxim Orlov <[email protected]>
2024-11-19 17:53 ` Maxim Orlov <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox