public inbox for [email protected]
help / color / mirror / Atom feedFrom: Maxim Orlov <[email protected]>
To: Heikki Linnakangas <[email protected]>
Cc: wenhui qiu <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Cc: Ashutosh Bapat <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: make mxidoff 64 bits
Date: Thu, 30 Oct 2025 19:17:00 +0300
Message-ID: <CACG=ezb53xpK39y2oe8H5BAAEnrtpqAjTF_xKoFZh8z6PJY7zA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CACG=ezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com>
<CACG=ezYiLzCSo43uTPzAeq8ZCnGSkAsw061=oMyw5J1NUZ9Jwg@mail.gmail.com>
<CAExHW5soKc9mhLhroi__yrPD-ymkFbz=e5hyZ34iqjM-cdK9_g@mail.gmail.com>
<CACG=ezaLdhpFY9_qr7B6s7kkg_=s5S_ZD2=dsSeBNgdWWWuKbg@mail.gmail.com>
<CACG=ezaFoCx4XEGi4gQWrusYD81LTLTKBUdgs-mEjNwzVarRnw@mail.gmail.com>
<CACG=ezarNdLaTdr3fdWGyVJEzRAa=Vd5LLmu7VN2r7XL6LH8xA@mail.gmail.com>
<CAPpHfdtPybyMYBj-x3-Z5=4bj_vhYk2R0nezfy=Vjcz4QBMDgw@mail.gmail.com>
<CACG=ezaCc4bFfua-VA1NB6wppMPwuMmZiGdrkb-iYK9ZmQa6gg@mail.gmail.com>
<CAGjGUAJUmSFMunCcK8DXcjLrs2Hfk2kFiaWDTc6ti03S8Echmw@mail.gmail.com>
<CACG=ezaCKR+--O2TZrm3jYGgpDjdpLXdz3marG6t_=nzP5+Gog@mail.gmail.com>
<CAGjGUA+uxpbkRaxkZmcSNiGXJ_G3Zj4-gzaiy=94DH7rvE8tig@mail.gmail.com>
<CACG=ezbPUASDL1eJ+c-ZkJMwRPukvp3EL0q1vSUa1h+fnX8y3g@mail.gmail.com>
<[email protected]>
<CACG=ezaABYDepYf24MUNxc2oHRERxXbXHNMP+i-Pr1AXu26x0A@mail.gmail.com>
<[email protected]>
On Thu, 30 Oct 2025 at 12:10, Heikki Linnakangas <[email protected]> wrote:
>
> Oh I see, the 'base' is not necessarily the base offset of the first
> multixact on the page, it's the base offset of the first multixid that
> is written to the page. And the (short) offsets can be negative. That's
> a frighteningly clever encoding scheme. One upshot of that is that WAL
> redo might get construct the page with a different 'base'. I guess that
> works, but it scares me. Could we come up with a more deterministic scheme?
>
> Definitely! The most stable approach is the one we had before, which
used actual 64-bit offsets in the SLRU. To be honest, I'm completely
happy with it. After all, what's most important for me is to have 64-bit
xids in Postgres, and this patch is a step towards that goal.
PFA v20 returns to using actual 64-bit offsets for on-disk SLRU
segments.
Fortunately, now that I've separated reading and writing offsets into
different functions, switching from one implementation to another is
easy to do.
Here's a quick overview of the current state of the patch:
1) Access to the offset is placed to separate calls:
MXOffsetWrite/MXOffsetRead.
2) I abandoned byte juggling in pg_upgrade and moved to using logic that
replicates the work with offsets im multixact.c
3) As a result, the update issue came down to the correct implementation
of functions MXOffsetWrite/MXOffsetRead.
4) The only question that remains is the question of disk representation
of 64-bit offsets in SLRU segments.
My thoughts on point (4).
Using 32-bit offsets + some kind of packing:
Pros:
+ Reduce the total disc space used by the segments; ideally it is
almost the same as before.
Cons:
- Reduces reliability (losing a part will most likely result in losing
the entire page).
- Complicates code, especially considering that segments may be written
to the page in random order.
Using 64-bit offsets in SLRU:
Pros:
+ Easy to implement/transparent logic.
Cons:
- Increases the amount of disk space used.
In terms of speed, I'm not sure which will be faster. On the one hand,
64-bit eliminates the necessity for calculations and branching. On the
other hand, the amount of data used will increase.
I am not opposed to any of these options, as our primary goal is getting
64-bit offsets. However, I like the approach using full 64-bit offsets
in SLRU, because it is more clear and, should we say, robust. Yes, it
will increase the number of segment, however this is not heap data in
for a table. Under typical circumstances, there should not be too many
such segments.
--
Best regards,
Maxim Orlov.
From dd45bfb97be126c03a1c4e41f5794c5726dcd413 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 29 Oct 2025 14:19:56 +0300
Subject: [PATCH v20 5/5] TEST: Add test for 64-bit mxoff in pg_upgrade
---
src/bin/pg_upgrade/t/007_mxoff.pl | 461 ++++++++++++++++++++++++++++++
1 file changed, 461 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/007_mxoff.pl
diff --git a/src/bin/pg_upgrade/t/007_mxoff.pl b/src/bin/pg_upgrade/t/007_mxoff.pl
new file mode 100644
index 00000000000..26fd6e9c5d0
--- /dev/null
+++ b/src/bin/pg_upgrade/t/007_mxoff.pl
@@ -0,0 +1,461 @@
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use Math::BigInt;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::AdjustDump;
+use PostgreSQL::Test::AdjustUpgrade;
+use Test::More;
+
+# This test involves different multitransaction states, similarly to that of
+# 002_pg_upgrade.pl.
+
+unless (defined($ENV{oldinstall}))
+{
+ plan skip_all => 'to run test set oldinstall environment variable to the pre 64-bit mxoff cluster';
+}
+
+# Temp dir for a dumps.
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Get NextMultiOffset.
+sub next_mxoff
+{
+ my $node = shift;
+
+ my $pg_controldata_path =
+ defined($node->install_path) ?
+ $node->install_path . '/bin/pg_controldata' :
+ 'pg_controldata';
+ my ($stdout, $stderr) = run_command([ $pg_controldata_path,
+ $node->data_dir ]);
+ my @control_data = split("\n", $stdout);
+ my $next_mxoff = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+ {
+ $next_mxoff = $1;
+ last;
+ }
+ }
+ die "NextMultiOffset not found in control file\n"
+ unless defined($next_mxoff);
+
+ return $next_mxoff;
+}
+
+# Consume around 10k of mxoffsets.
+sub mxact_eater
+{
+ my $node = shift;
+ my $tbl = 'FOO';
+
+ my ($mxoff1, $mxoff2);
+
+ $mxoff1 = next_mxoff($node);
+ $node->start;
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+
+ # consume around 10k mxoff
+ my $nclients = 10;
+ my $update_every = 75;
+ my @connections = ();
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres');
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ for (my $i = 0; $i < 1000; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} FOR KEY SHARE");
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+
+ $node->stop;
+ $mxoff2 = next_mxoff($node);
+
+ return $mxoff1, $mxoff2;
+}
+
+# Consume around 2M of mxoffsets.
+sub mxact_huge_eater
+{
+ my $node = shift;
+ my $tbl = 'FOO';
+
+ my ($mxoff1, $mxoff2);
+
+ $mxoff1 = next_mxoff($node);
+ $node->start;
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+
+ # consume around 1M mxoff
+ my $nclients = 10;
+ my $update_every = 95;
+ my @connections = ();
+ my $timeout = 10 * $PostgreSQL::Test::Utils::timeout_default;
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres',
+ timeout => $timeout);
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ # It's a long process, better to tell about progress.
+ my $n_steps = 200_000;
+ my $step = int($n_steps / 10);
+
+ diag "\nstart to consume mxoffsets ...\n";
+ for (my $i = 0; $i < $n_steps; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ # Perform some non-key UPDATEs too, to exercise different multixact
+ # member statuses.
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} " .
+ "TABLESAMPLE SYSTEM (85) " .
+ "FOR KEY SHARE");
+ }
+
+ if ($i % $step == 0)
+ {
+ my $done = int(($i / $n_steps) * 100);
+ diag "$done% done...";
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+
+ $node->stop;
+ $mxoff2 = next_mxoff($node);
+
+ return $mxoff1, $mxoff2;
+}
+
+# Set oldest multixact-offset
+sub reset_mxoff
+{
+ my $node = shift;
+ my $offset = shift;
+
+ my $pg_resetwal_path = $node->install_path . '/bin/pg_resetwal';
+
+ # Get block size
+ my $out = (run_command([ $pg_resetwal_path, '--dry-run',
+ $node->data_dir ]))[0];
+ $out =~ /^Database block size: *(\d+)$/m or die;
+ my $blcksz = $1;
+
+ # Reset to new offset
+ my @cmd = ($pg_resetwal_path, '--pgdata' => $node->data_dir);
+ push @cmd, '--multixact-offset' => $offset;
+ command_ok(\@cmd, 'set oldest multixact-offset');
+
+ # Fill empty pg_multixact/members segment
+ my $mult = 32 * int($blcksz / 20) * 4;
+ my $segname = sprintf "%04X", $offset / $mult;
+
+ my @dd = ('dd');
+ push @dd, "if=/dev/zero";
+ push @dd, "of=" . $node->data_dir . "/pg_multixact/members/" . $segname;
+ push @dd, "bs=$blcksz";
+ push @dd, "count=32";
+ command_ok(\@dd, 'fill empty multixact-members');
+}
+
+sub get_dump_for_comparison
+{
+ my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+ my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+ my $dump_adjusted = "${dumpfile}_adjusted";
+
+ open(my $dh, '>', $dump_adjusted)
+ || die "could not open $dump_adjusted for writing $!";
+
+ $node->run_log(
+ [
+ 'pg_dump', '--no-sync',
+ '--restrict-key' => 'test',
+ '-d' => $node->connstr($db),
+ '-f' => $dumpfile
+ ]);
+
+ print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+ $adjust_child_columns);
+ close($dh);
+
+ return $dump_adjusted;
+}
+
+# Main test workhorse routine.
+# Make pg_upgrade, dump data and compare it.
+sub run_test
+{
+ my $tag = shift;
+ my $oldnode = shift;
+ my $newnode = shift;
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '--old-datadir' => $oldnode->data_dir,
+ '--new-datadir' => $newnode->data_dir,
+ '--old-bindir' => $oldnode->config_data('--bindir'),
+ '--new-bindir' => $newnode->config_data('--bindir'),
+ '--socketdir' => $newnode->host,
+ '--old-port' => $oldnode->port,
+ '--new-port' => $newnode->port,
+ $mode,
+ ],
+ 'run of pg_upgrade for new instance');
+ ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ removed after pg_upgrade success");
+
+ $oldnode->start;
+ my $src_dump =
+ get_dump_for_comparison($oldnode, 'postgres',
+ "oldnode_${tag}_dump", 0);
+ $oldnode->stop;
+
+ $newnode->start;
+ my $dst_dump =
+ get_dump_for_comparison($newnode, 'postgres',
+ "newnode_${tag}_dump", 0);
+ $newnode->stop;
+
+ compare_files($src_dump, $dst_dump,
+ 'dump outputs from original and restored regression databases match');
+}
+
+sub to_hex
+{
+ my $arg = shift;
+
+ $arg = Math::BigInt->new($arg);
+ $arg = $arg->as_hex();
+
+ return $arg;
+}
+
+# case #1: start old node from defaults
+{
+ my $tag = 1;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+ $old->init(extra => ['-k']);
+
+ my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #2: start old node from before 32-bit wraparound
+{
+ my $tag = 2;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+ reset_mxoff($old, 0xFFFF0000);
+
+ my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #3: start old node near 32-bit wraparound and reach wraparound state.
+{
+ my $tag = 3;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+
+ reset_mxoff($old, 0xFFFFEC77);
+ my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #4: start old node from defaults
+{
+ my $tag = 4;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+
+ diag "test #${tag} for multiple mxoff segments";
+ my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #5: start old node from before 32-bit wraparound
+{
+ my $tag = 5;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+ reset_mxoff($old, 0xFFFF0000);
+
+ diag "test #${tag} for multiple mxoff segments";
+ my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #6: start old node near 32-bit wraparound and reach wraparound state.
+{
+ my $tag = 6;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+
+ reset_mxoff($old, 0xFFFFFFFF - 1_000_000);
+ my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+ diag "test #${tag} for multiple mxoff segments";
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+done_testing();
--
2.43.0
From 9ae48f1a66fcb2f1651a168c481214f1f9b6e201 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Fri, 24 Oct 2025 11:47:50 +0300
Subject: [PATCH v20 4/5] TEST: bump catversion
To avoid constant CF-bot complains, make catversion bump in a separate
commit.
NOTE: keep it in sync with MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
---
src/include/catalog/catversion.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 18e95179ab6..6a13fa3cdb0 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202510281
+#define CATALOG_VERSION_NO 999999999
#endif
--
2.43.0
Attachments:
[text/plain] v20-0005-TEST-Add-test-for-64-bit-mxoff-in-pg_upgrade.patch.txt (11.4K, 3-v20-0005-TEST-Add-test-for-64-bit-mxoff-in-pg_upgrade.patch.txt)
download | inline diff:
From dd45bfb97be126c03a1c4e41f5794c5726dcd413 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 29 Oct 2025 14:19:56 +0300
Subject: [PATCH v20 5/5] TEST: Add test for 64-bit mxoff in pg_upgrade
---
src/bin/pg_upgrade/t/007_mxoff.pl | 461 ++++++++++++++++++++++++++++++
1 file changed, 461 insertions(+)
create mode 100644 src/bin/pg_upgrade/t/007_mxoff.pl
diff --git a/src/bin/pg_upgrade/t/007_mxoff.pl b/src/bin/pg_upgrade/t/007_mxoff.pl
new file mode 100644
index 00000000000..26fd6e9c5d0
--- /dev/null
+++ b/src/bin/pg_upgrade/t/007_mxoff.pl
@@ -0,0 +1,461 @@
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use Math::BigInt;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::AdjustDump;
+use PostgreSQL::Test::AdjustUpgrade;
+use Test::More;
+
+# This test involves different multitransaction states, similarly to that of
+# 002_pg_upgrade.pl.
+
+unless (defined($ENV{oldinstall}))
+{
+ plan skip_all => 'to run test set oldinstall environment variable to the pre 64-bit mxoff cluster';
+}
+
+# Temp dir for a dumps.
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Get NextMultiOffset.
+sub next_mxoff
+{
+ my $node = shift;
+
+ my $pg_controldata_path =
+ defined($node->install_path) ?
+ $node->install_path . '/bin/pg_controldata' :
+ 'pg_controldata';
+ my ($stdout, $stderr) = run_command([ $pg_controldata_path,
+ $node->data_dir ]);
+ my @control_data = split("\n", $stdout);
+ my $next_mxoff = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+ {
+ $next_mxoff = $1;
+ last;
+ }
+ }
+ die "NextMultiOffset not found in control file\n"
+ unless defined($next_mxoff);
+
+ return $next_mxoff;
+}
+
+# Consume around 10k of mxoffsets.
+sub mxact_eater
+{
+ my $node = shift;
+ my $tbl = 'FOO';
+
+ my ($mxoff1, $mxoff2);
+
+ $mxoff1 = next_mxoff($node);
+ $node->start;
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+
+ # consume around 10k mxoff
+ my $nclients = 10;
+ my $update_every = 75;
+ my @connections = ();
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres');
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ for (my $i = 0; $i < 1000; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} FOR KEY SHARE");
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+
+ $node->stop;
+ $mxoff2 = next_mxoff($node);
+
+ return $mxoff1, $mxoff2;
+}
+
+# Consume around 2M of mxoffsets.
+sub mxact_huge_eater
+{
+ my $node = shift;
+ my $tbl = 'FOO';
+
+ my ($mxoff1, $mxoff2);
+
+ $mxoff1 = next_mxoff($node);
+ $node->start;
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+
+ # consume around 1M mxoff
+ my $nclients = 10;
+ my $update_every = 95;
+ my @connections = ();
+ my $timeout = 10 * $PostgreSQL::Test::Utils::timeout_default;
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres',
+ timeout => $timeout);
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ # It's a long process, better to tell about progress.
+ my $n_steps = 200_000;
+ my $step = int($n_steps / 10);
+
+ diag "\nstart to consume mxoffsets ...\n";
+ for (my $i = 0; $i < $n_steps; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ # Perform some non-key UPDATEs too, to exercise different multixact
+ # member statuses.
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} " .
+ "TABLESAMPLE SYSTEM (85) " .
+ "FOR KEY SHARE");
+ }
+
+ if ($i % $step == 0)
+ {
+ my $done = int(($i / $n_steps) * 100);
+ diag "$done% done...";
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+
+ $node->stop;
+ $mxoff2 = next_mxoff($node);
+
+ return $mxoff1, $mxoff2;
+}
+
+# Set oldest multixact-offset
+sub reset_mxoff
+{
+ my $node = shift;
+ my $offset = shift;
+
+ my $pg_resetwal_path = $node->install_path . '/bin/pg_resetwal';
+
+ # Get block size
+ my $out = (run_command([ $pg_resetwal_path, '--dry-run',
+ $node->data_dir ]))[0];
+ $out =~ /^Database block size: *(\d+)$/m or die;
+ my $blcksz = $1;
+
+ # Reset to new offset
+ my @cmd = ($pg_resetwal_path, '--pgdata' => $node->data_dir);
+ push @cmd, '--multixact-offset' => $offset;
+ command_ok(\@cmd, 'set oldest multixact-offset');
+
+ # Fill empty pg_multixact/members segment
+ my $mult = 32 * int($blcksz / 20) * 4;
+ my $segname = sprintf "%04X", $offset / $mult;
+
+ my @dd = ('dd');
+ push @dd, "if=/dev/zero";
+ push @dd, "of=" . $node->data_dir . "/pg_multixact/members/" . $segname;
+ push @dd, "bs=$blcksz";
+ push @dd, "count=32";
+ command_ok(\@dd, 'fill empty multixact-members');
+}
+
+sub get_dump_for_comparison
+{
+ my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+ my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+ my $dump_adjusted = "${dumpfile}_adjusted";
+
+ open(my $dh, '>', $dump_adjusted)
+ || die "could not open $dump_adjusted for writing $!";
+
+ $node->run_log(
+ [
+ 'pg_dump', '--no-sync',
+ '--restrict-key' => 'test',
+ '-d' => $node->connstr($db),
+ '-f' => $dumpfile
+ ]);
+
+ print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+ $adjust_child_columns);
+ close($dh);
+
+ return $dump_adjusted;
+}
+
+# Main test workhorse routine.
+# Make pg_upgrade, dump data and compare it.
+sub run_test
+{
+ my $tag = shift;
+ my $oldnode = shift;
+ my $newnode = shift;
+
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '--old-datadir' => $oldnode->data_dir,
+ '--new-datadir' => $newnode->data_dir,
+ '--old-bindir' => $oldnode->config_data('--bindir'),
+ '--new-bindir' => $newnode->config_data('--bindir'),
+ '--socketdir' => $newnode->host,
+ '--old-port' => $oldnode->port,
+ '--new-port' => $newnode->port,
+ $mode,
+ ],
+ 'run of pg_upgrade for new instance');
+ ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ removed after pg_upgrade success");
+
+ $oldnode->start;
+ my $src_dump =
+ get_dump_for_comparison($oldnode, 'postgres',
+ "oldnode_${tag}_dump", 0);
+ $oldnode->stop;
+
+ $newnode->start;
+ my $dst_dump =
+ get_dump_for_comparison($newnode, 'postgres',
+ "newnode_${tag}_dump", 0);
+ $newnode->stop;
+
+ compare_files($src_dump, $dst_dump,
+ 'dump outputs from original and restored regression databases match');
+}
+
+sub to_hex
+{
+ my $arg = shift;
+
+ $arg = Math::BigInt->new($arg);
+ $arg = $arg->as_hex();
+
+ return $arg;
+}
+
+# case #1: start old node from defaults
+{
+ my $tag = 1;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+ $old->init(extra => ['-k']);
+
+ my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #2: start old node from before 32-bit wraparound
+{
+ my $tag = 2;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+ reset_mxoff($old, 0xFFFF0000);
+
+ my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #3: start old node near 32-bit wraparound and reach wraparound state.
+{
+ my $tag = 3;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+
+ reset_mxoff($old, 0xFFFFEC77);
+ my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #4: start old node from defaults
+{
+ my $tag = 4;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+
+ diag "test #${tag} for multiple mxoff segments";
+ my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #5: start old node from before 32-bit wraparound
+{
+ my $tag = 5;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+ reset_mxoff($old, 0xFFFF0000);
+
+ diag "test #${tag} for multiple mxoff segments";
+ my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #6: start old node near 32-bit wraparound and reach wraparound state.
+{
+ my $tag = 6;
+ my $old =
+ PostgreSQL::Test::Cluster->new("oldnode${tag}",
+ install_path => $ENV{oldinstall});
+
+ $old->init(extra => ['-k']);
+
+ reset_mxoff($old, 0xFFFFFFFF - 1_000_000);
+ my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+ diag "test #${tag} for multiple mxoff segments";
+ my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+ $new->init;
+
+ run_test($tag, $old, $new);
+
+ $start_mxoff = to_hex($start_mxoff);
+ $finish_mxoff = to_hex($finish_mxoff);
+
+ my $next_mxoff = to_hex(next_mxoff($new));
+
+ note ">>> case #${tag}\n" .
+ " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+ " newnode mxoff ${next_mxoff}\n";
+}
+
+done_testing();
--
2.43.0
[application/octet-stream] v20-0003-Add-test-for-64-bit-mxoff-in-pg_resetwal.patch (4.9K, 4-v20-0003-Add-test-for-64-bit-mxoff-in-pg_resetwal.patch)
download | inline diff:
From 2a6ae43f5be3a45741c233b001dfb8721c7f8217 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Tue, 28 Oct 2025 19:08:26 +0300
Subject: [PATCH v20 3/5] Add test for 64-bit mxoff in pg_resetwal
---
src/bin/pg_resetwal/meson.build | 1 +
src/bin/pg_resetwal/t/003_mxoff.pl | 170 +++++++++++++++++++++++++++++
2 files changed, 171 insertions(+)
create mode 100644 src/bin/pg_resetwal/t/003_mxoff.pl
diff --git a/src/bin/pg_resetwal/meson.build b/src/bin/pg_resetwal/meson.build
index 290832b2299..1e2dfb38a5b 100644
--- a/src/bin/pg_resetwal/meson.build
+++ b/src/bin/pg_resetwal/meson.build
@@ -25,6 +25,7 @@ tests += {
'tests': [
't/001_basic.pl',
't/002_corrupted.pl',
+ 't/003_mxoff.pl',
],
},
}
diff --git a/src/bin/pg_resetwal/t/003_mxoff.pl b/src/bin/pg_resetwal/t/003_mxoff.pl
new file mode 100644
index 00000000000..3c1b7fa1d33
--- /dev/null
+++ b/src/bin/pg_resetwal/t/003_mxoff.pl
@@ -0,0 +1,170 @@
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use Math::BigInt;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub mxact_eater
+{
+ my $node = shift;
+ my $tbl = shift;
+
+ $node->start;
+ $node->safe_psql('postgres',
+ "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+ " WITH (AUTOVACUUM_ENABLED=FALSE);" .
+ "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) G;");
+
+ # consume around 10k multixact-offsetfs
+ my $nclients = 10;
+ my $update_every = 75;
+ my @connections = ();
+
+ for (0..$nclients)
+ {
+ my $conn = $node->background_psql('postgres');
+ $conn->query_safe("BEGIN");
+
+ push(@connections, $conn);
+ }
+
+ for (my $i = 0; $i < 1000; $i++)
+ {
+ my $conn = $connections[$i % $nclients];
+
+ $conn->query_safe("COMMIT;");
+ $conn->query_safe("BEGIN");
+
+ if ($i % $update_every == 0)
+ {
+ $conn->query_safe(
+ "UPDATE ${tbl} SET " .
+ "N_UPDATED = N_UPDATED + 1 " .
+ "WHERE I = ${i} % 50");
+ }
+ else
+ {
+ $conn->query_safe(
+ "SELECT * FROM ${tbl} FOR KEY SHARE");
+ }
+ }
+
+ for my $conn (@connections)
+ {
+ $conn->quit();
+ }
+
+ $node->stop;
+}
+
+sub next_mxoff
+{
+ my $node = shift;
+ my ($stdout, $stderr) =
+ run_command([ 'pg_controldata', $node->data_dir ]);
+ my @control_data = split("\n", $stdout);
+ my $next_mxoff = undef;
+
+ foreach (@control_data)
+ {
+ if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+ {
+ $next_mxoff = $1;
+ last;
+ }
+ }
+ die "NextMultiOffset not found in control file\n"
+ unless defined($next_mxoff);
+
+ return $next_mxoff;
+}
+
+sub reset_mxoff
+{
+ my $node = shift;
+ my $offset = shift;
+ $offset = Math::BigInt->new($offset);
+
+ # Get block size
+ my $out = (run_command([ 'pg_resetwal', '--dry-run', $node->data_dir ]))[0];
+ $out =~ /^Database block size: *(\d+)$/m or die;
+ my $blcksz = $1;
+
+ # Reset to new offset
+ my @cmd = ('pg_resetwal', '--pgdata' => $node->data_dir);
+ push @cmd, '--multixact-offset' => $offset->as_hex();
+ command_ok(\@cmd, 'set oldest multixact-offset');
+
+ # Fill empty pg_multixact/members segment
+ my $mult = 32 * int($blcksz / 20) * 4;
+ my $segname = sprintf "%015X", $offset / $mult;
+
+ my @dd = ('dd');
+ push @dd, "if=/dev/zero";
+ push @dd, "of=" . $node->data_dir . "/pg_multixact/members/" . $segname;
+ push @dd, "bs=$blcksz";
+ push @dd, "count=32";
+ command_ok(\@dd, 'fill empty multixact-members');
+}
+
+my ($off1, $off2);
+
+# start from defaults
+my $node1 = PostgreSQL::Test::Cluster->new('node1');
+$node1->init;
+$off1 = next_mxoff($node1);
+mxact_eater($node1, "FOO");
+$off2 = next_mxoff($node1);
+note "> start from $off1, finished at $off2\n";
+
+# start from before 32-bit wraparound
+my $node2 = PostgreSQL::Test::Cluster->new('node2');
+$node2->init;
+reset_mxoff($node2, 0xFFFF0000);
+$off1 = next_mxoff($node2);
+mxact_eater($node2, "FOO");
+$off2 = next_mxoff($node2);
+note "> start from $off1, finished at $off2\n";
+
+# start near 32-bit wraparound
+my $node3 = PostgreSQL::Test::Cluster->new('node3');
+$node3->init;
+reset_mxoff($node3, 0xFFFFEC77);
+$off1 = next_mxoff($node3);
+mxact_eater($node3, "FOO");
+$off2 = next_mxoff($node3);
+note "> start from $off1, finished at $off2\n";
+
+# start over 32-bit wraparound
+my $node4 = PostgreSQL::Test::Cluster->new('node4');
+$node4->init;
+reset_mxoff($node4, '0xFFFFFFFF0000');
+$off1 = next_mxoff($node4);
+mxact_eater($node4, "FOO");
+$off2 = next_mxoff($node3);
+note "> start from $off1, finished at $off2\n";
+
+# check invariant
+$node1->start;
+$node2->start;
+$node3->start;
+$node4->start;
+
+my $var1 = $node1->safe_psql('postgres', 'TABLE FOO');
+my $var2 = $node2->safe_psql('postgres', 'TABLE FOO');
+my $var3 = $node3->safe_psql('postgres', 'TABLE FOO');
+my $var4 = $node4->safe_psql('postgres', 'TABLE FOO');
+ok($var1 eq $var2 eq $var3 eq $var4,
+ 'check table invariant in all nodes');
+
+$node4->stop;
+$node3->stop;
+$node2->stop;
+$node1->stop;
+
+done_testing();
--
2.43.0
[application/octet-stream] v20-0001-Use-64-bit-multixact-offsets.patch (29.2K, 5-v20-0001-Use-64-bit-multixact-offsets.patch)
download | inline diff:
From 651dbd070e96059980c50d3dc414ffb7b590bb4b Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 7 Aug 2024 16:35:22 +0300
Subject: [PATCH v20 1/5] Use 64-bit multixact offsets
Switching to 64-bit multitransaction offsets removes wraparound and the
2^32 limit on their total number.
Author: Maxim Orlov <[email protected]>
---
src/backend/access/rmgrdesc/mxactdesc.c | 4 +-
src/backend/access/rmgrdesc/xlogdesc.c | 2 +-
src/backend/access/transam/multixact.c | 312 ++++------------------
src/backend/access/transam/xlogrecovery.c | 2 +-
src/backend/commands/vacuum.c | 2 +-
src/backend/postmaster/autovacuum.c | 4 +-
src/bin/pg_controldata/pg_controldata.c | 2 +-
src/bin/pg_resetwal/pg_resetwal.c | 6 +-
src/bin/pg_resetwal/t/001_basic.pl | 2 +-
src/include/access/multixact.h | 3 +-
src/include/c.h | 2 +-
11 files changed, 63 insertions(+), 278 deletions(-)
diff --git a/src/backend/access/rmgrdesc/mxactdesc.c b/src/backend/access/rmgrdesc/mxactdesc.c
index 3ca0582db36..052dd0a4ce5 100644
--- a/src/backend/access/rmgrdesc/mxactdesc.c
+++ b/src/backend/access/rmgrdesc/mxactdesc.c
@@ -65,7 +65,7 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
xl_multixact_create *xlrec = (xl_multixact_create *) rec;
int i;
- appendStringInfo(buf, "%u offset %u nmembers %d: ", xlrec->mid,
+ appendStringInfo(buf, "%u offset %" PRIu64 " nmembers %d: ", xlrec->mid,
xlrec->moff, xlrec->nmembers);
for (i = 0; i < xlrec->nmembers; i++)
out_member(buf, &xlrec->members[i]);
@@ -74,7 +74,7 @@ multixact_desc(StringInfo buf, XLogReaderState *record)
{
xl_multixact_truncate *xlrec = (xl_multixact_truncate *) rec;
- appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
+ appendStringInfo(buf, "offsets [%u, %u), members [%" PRIu64 ", %" PRIu64 ")",
xlrec->startTruncOff, xlrec->endTruncOff,
xlrec->startTruncMemb, xlrec->endTruncMemb);
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index cd6c2a2f650..441034f5929 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -66,7 +66,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
CheckPoint *checkpoint = (CheckPoint *) rec;
appendStringInfo(buf, "redo %X/%08X; "
- "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %u; "
+ "tli %u; prev tli %u; fpw %s; wal_level %s; xid %u:%u; oid %u; multi %u; offset %" PRIu64 "; "
"oldest xid %u in DB %u; oldest multi %u in DB %u; "
"oldest/newest commit timestamp xid: %u/%u; "
"oldest running xid %u; %s",
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9d5f130af7e..93a1e4cfd2a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -92,17 +92,9 @@
/*
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
* used everywhere else in Postgres.
- *
- * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
- * MultiXact page numbering also wraps around at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
- * take no explicit notice of that fact in this module, except when comparing
- * segment and page numbers in TruncateMultiXact (see
- * MultiXactOffsetPagePrecedes).
*/
-/* We need four bytes per offset */
+/* We need 8 bytes per offset */
#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
static inline int64
@@ -208,10 +200,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
-/* Multixact members wraparound thresholds. */
-#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
-#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
- (MaxMultiXactOffset - MaxMultiXactOffset / 4)
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuumin order to release the disk space, reduce table bloat if
+ * possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
static inline MultiXactId
PreviousMultiXactId(MultiXactId multi)
@@ -228,6 +224,24 @@ static SlruCtlData MultiXactMemberCtlData;
#define MultiXactOffsetCtl (&MultiXactOffsetCtlData)
#define MultiXactMemberCtl (&MultiXactMemberCtlData)
+static inline MultiXactOffset
+MXOffsetRead(int entryno, int slotno)
+{
+ MultiXactOffset *offptr =
+ (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
+
+ return offptr[entryno];
+}
+
+static inline void
+MXOffsetWrite(int entryno, int slotno, MultiXactOffset offset)
+{
+ MultiXactOffset *offptr =
+ (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
+
+ offptr[entryno] = offset;
+}
+
/*
* MultiXact state shared across all backends. All this state is protected
* by MultiXactGenLock. (We also use SLRU bank's lock of MultiXactOffset and
@@ -268,9 +282,6 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
- /* support for members anti-wraparound measures */
- MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
-
/*
* This is used to sleep until a multixact offset is written when we want
* to create the next one.
@@ -401,8 +412,6 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
- MultiXactOffset start, uint32 distance);
static bool SetOffsetVacuumLimit(bool is_startup);
static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
static void WriteMTruncateXlogRec(Oid oldestMultiDB,
@@ -911,7 +920,6 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
int64 prev_pageno;
int entryno;
int slotno;
- MultiXactOffset *offptr;
int i;
LWLock *lock;
LWLock *prevlock = NULL;
@@ -930,10 +938,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
* take the trouble to generalize the slru.c error reporting code.
*/
slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
- offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
- offptr += entryno;
- *offptr = offset;
+ MXOffsetWrite(entryno, slotno, offset);
MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
@@ -1155,78 +1161,6 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
- /*----------
- * Protect against overrun of the members space as well, with the
- * following rules:
- *
- * If we're past offsetStopLimit, refuse to generate more multis.
- * If we're close to offsetStopLimit, emit a warning.
- *
- * Arbitrarily, we start emitting warnings when we're 20 segments or less
- * from offsetStopLimit.
- *
- * Note we haven't updated the shared state yet, so if we fail at this
- * point, the multixact ID we grabbed can still be used by the next guy.
- *
- * Note that there is no point in forcing autovacuum runs here: the
- * multixact freeze settings would have to be reduced for that to have any
- * effect.
- *----------
- */
-#define OFFSET_WARN_SEGMENTS 20
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
- nmembers))
- {
- /* see comment in the corresponding offsets wraparound case */
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
-
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
- errdetail_plural("This command would create a multixact with %u members, but the remaining space is only enough for %u member.",
- "This command would create a multixact with %u members, but the remaining space is only enough for %u members.",
- MultiXactState->offsetStopLimit - nextOffset - 1,
- nmembers,
- MultiXactState->offsetStopLimit - nextOffset - 1),
- errhint("Execute a database-wide VACUUM in database with OID %u with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.",
- MultiXactState->oldestMultiXactDB)));
- }
-
- /*
- * Check whether we should kick autovacuum into action, to prevent members
- * wraparound. NB we use a much larger window to trigger autovacuum than
- * just the warning limit. The warning is just a measure of last resort -
- * this is in line with GetNewTransactionId's behaviour.
- */
- if (!MultiXactState->oldestOffsetKnown ||
- (MultiXactState->nextOffset - MultiXactState->oldestOffset
- > MULTIXACT_MEMBER_SAFE_THRESHOLD))
- {
- /*
- * To avoid swamping the postmaster with signals, we issue the autovac
- * request only when crossing a segment boundary. With default
- * compilation settings that's roughly after 50k members. This still
- * gives plenty of chances before we get into real trouble.
- */
- if ((MXOffsetToMemberPage(nextOffset) / SLRU_PAGES_PER_SEGMENT) !=
- (MXOffsetToMemberPage(nextOffset + nmembers) / SLRU_PAGES_PER_SEGMENT))
- SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
- }
-
- if (MultiXactState->oldestOffsetKnown &&
- MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
- nextOffset,
- nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
- ereport(WARNING,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg_plural("database with OID %u must be vacuumed before %d more multixact member is used",
- "database with OID %u must be vacuumed before %d more multixact members are used",
- MultiXactState->offsetStopLimit - nextOffset + nmembers,
- MultiXactState->oldestMultiXactDB,
- MultiXactState->offsetStopLimit - nextOffset + nmembers),
- errhint("Execute a database-wide VACUUM in that database with reduced \"vacuum_multixact_freeze_min_age\" and \"vacuum_multixact_freeze_table_age\" settings.")));
-
ExtendMultiXactMember(nextOffset, nmembers);
/*
@@ -1255,7 +1189,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
LWLockRelease(MultiXactGenLock);
- debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
+ debug_elog4(DEBUG2, "GetNew: returning %u offset %" PRIu64, result,
+ *offset);
return result;
}
@@ -1294,7 +1229,6 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
int64 prev_pageno;
int entryno;
int slotno;
- MultiXactOffset *offptr;
MultiXactOffset offset;
int length;
int truelength;
@@ -1418,9 +1352,8 @@ retry:
LWLockAcquire(lock, LW_EXCLUSIVE);
slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
- offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
- offptr += entryno;
- offset = *offptr;
+
+ offset = MXOffsetRead(entryno, slotno);
Assert(offset != 0);
@@ -1467,9 +1400,7 @@ retry:
slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
}
- offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
- offptr += entryno;
- nextMXOffset = *offptr;
+ nextMXOffset = MXOffsetRead(entryno, slotno);
if (nextMXOffset == 0)
{
@@ -1973,7 +1904,7 @@ MultiXactShmemInit(void)
"pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
LWTRANCHE_MULTIXACTMEMBER_SLRU,
SYNC_HANDLER_MULTIXACT_MEMBER,
- false);
+ true);
/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
/* Initialize our shared state struct */
@@ -2150,7 +2081,6 @@ TrimMultiXact(void)
slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
-
MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
@@ -2223,7 +2153,7 @@ MultiXactGetCheckptMulti(bool is_shutdown,
LWLockRelease(MultiXactGenLock);
debug_elog6(DEBUG2,
- "MultiXact: checkpoint is nextMulti %u, nextOffset %u, oldestMulti %u in DB %u",
+ "MultiXact: checkpoint is nextMulti %u, nextOffset %" PRIu64 ", oldestMulti %u in DB %u",
*nextMulti, *nextMultiOffset, *oldestMulti, *oldestMultiDB);
}
@@ -2258,7 +2188,7 @@ void
MultiXactSetNextMXact(MultiXactId nextMulti,
MultiXactOffset nextMultiOffset)
{
- debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %u",
+ debug_elog4(DEBUG2, "MultiXact: setting next multi to %u offset %" PRIu64,
nextMulti, nextMultiOffset);
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->nextMXact = nextMulti;
@@ -2449,7 +2379,7 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
}
if (MultiXactOffsetPrecedes(MultiXactState->nextOffset, minMultiOffset))
{
- debug_elog3(DEBUG2, "MultiXact: setting next offset to %u",
+ debug_elog3(DEBUG2, "MultiXact: setting next offset to %" PRIU64,
minMultiOffset);
MultiXactState->nextOffset = minMultiOffset;
}
@@ -2633,15 +2563,13 @@ GetOldestMultiXactId(void)
}
/*
- * Determine how aggressively we need to vacuum in order to prevent member
- * wraparound.
+ * Determine if we need to vacuum for member or not.
*
* To do so determine what's the oldest member offset and install the limit
* info in MultiXactState, where it can be used to prevent overrun of old data
* in the members SLRU area.
*
- * The return value is true if emergency autovacuum is required and false
- * otherwise.
+ * The return value is true if autovacuum is required and false otherwise.
*/
static bool
SetOffsetVacuumLimit(bool is_startup)
@@ -2653,8 +2581,6 @@ SetOffsetVacuumLimit(bool is_startup)
MultiXactOffset nextOffset;
bool oldestOffsetKnown = false;
bool prevOldestOffsetKnown;
- MultiXactOffset offsetStopLimit = 0;
- MultiXactOffset prevOffsetStopLimit;
/*
* NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2669,7 +2595,6 @@ SetOffsetVacuumLimit(bool is_startup)
nextOffset = MultiXactState->nextOffset;
prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
prevOldestOffset = MultiXactState->oldestOffset;
- prevOffsetStopLimit = MultiXactState->offsetStopLimit;
Assert(MultiXactState->finishedStartup);
LWLockRelease(MultiXactGenLock);
@@ -2700,11 +2625,7 @@ SetOffsetVacuumLimit(bool is_startup)
oldestOffsetKnown =
find_multixact_start(oldestMultiXactId, &oldestOffset);
- if (oldestOffsetKnown)
- ereport(DEBUG1,
- (errmsg_internal("oldest MultiXactId member is at offset %u",
- oldestOffset)));
- else
+ if (!oldestOffsetKnown)
ereport(LOG,
(errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
oldestMultiXactId)));
@@ -2717,24 +2638,7 @@ SetOffsetVacuumLimit(bool is_startup)
* overrun of old data in the members SLRU area. We can only do so if the
* oldest offset is known though.
*/
- if (oldestOffsetKnown)
- {
- /* move back to start of the corresponding segment */
- offsetStopLimit = oldestOffset - (oldestOffset %
- (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
-
- /* always leave one segment before the wraparound point */
- offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
-
- if (!prevOldestOffsetKnown && !is_startup)
- ereport(LOG,
- (errmsg("MultiXact member wraparound protections are now enabled")));
-
- ereport(DEBUG1,
- (errmsg_internal("MultiXact member stop limit is now %u based on MultiXact %u",
- offsetStopLimit, oldestMultiXactId)));
- }
- else if (prevOldestOffsetKnown)
+ if (prevOldestOffsetKnown)
{
/*
* If we failed to get the oldest offset this time, but we have a
@@ -2744,69 +2648,19 @@ SetOffsetVacuumLimit(bool is_startup)
*/
oldestOffset = prevOldestOffset;
oldestOffsetKnown = true;
- offsetStopLimit = prevOffsetStopLimit;
}
/* Install the computed values */
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
MultiXactState->oldestOffset = oldestOffset;
MultiXactState->oldestOffsetKnown = oldestOffsetKnown;
- MultiXactState->offsetStopLimit = offsetStopLimit;
LWLockRelease(MultiXactGenLock);
/*
- * Do we need an emergency autovacuum? If we're not sure, assume yes.
+ * Do we need autovacuum? If we're not sure, assume yes.
*/
return !oldestOffsetKnown ||
- (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
-}
-
-/*
- * Return whether adding "distance" to "start" would move past "boundary".
- *
- * We use this to determine whether the addition is "wrapping around" the
- * boundary point, hence the name. The reason we don't want to use the regular
- * 2^31-modulo arithmetic here is that we want to be able to use the whole of
- * the 2^32-1 space here, allowing for more multixacts than would fit
- * otherwise.
- */
-static bool
-MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
- uint32 distance)
-{
- MultiXactOffset finish;
-
- /*
- * Note that offset number 0 is not used (see GetMultiXactIdMembers), so
- * if the addition wraps around the UINT_MAX boundary, skip that value.
- */
- finish = start + distance;
- if (finish < start)
- finish++;
-
- /*-----------------------------------------------------------------------
- * When the boundary is numerically greater than the starting point, any
- * value numerically between the two is not wrapped:
- *
- * <----S----B---->
- * [---) = F wrapped past B (and UINT_MAX)
- * [---) = F not wrapped
- * [----] = F wrapped past B
- *
- * When the boundary is numerically less than the starting point (i.e. the
- * UINT_MAX wraparound occurs somewhere in between) then all values in
- * between are wrapped:
- *
- * <----B----S---->
- * [---) = F not wrapped past B (but wrapped past UINT_MAX)
- * [---) = F wrapped past B (and UINT_MAX)
- * [----] = F not wrapped
- *-----------------------------------------------------------------------
- */
- if (start < boundary)
- return finish >= boundary || finish < start;
- else
- return finish >= boundary && finish < start;
+ (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
}
/*
@@ -2825,7 +2679,6 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
int64 pageno;
int entryno;
int slotno;
- MultiXactOffset *offptr;
Assert(MultiXactState->finishedStartup);
@@ -2843,9 +2696,9 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi);
- offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
- offptr += entryno;
- offset = *offptr;
+
+ offset = MXOffsetRead(entryno, slotno);
+
LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
*result = offset;
@@ -2893,73 +2746,6 @@ GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members,
return true;
}
-/*
- * Multixact members can be removed once the multixacts that refer to them
- * are older than every datminmxid. autovacuum_multixact_freeze_max_age and
- * vacuum_multixact_freeze_table_age work together to make sure we never have
- * too many multixacts; we hope that, at least under normal circumstances,
- * this will also be sufficient to keep us from using too many offsets.
- * However, if the average multixact has many members, we might exhaust the
- * members space while still using few enough members that these limits fail
- * to trigger relminmxid advancement by VACUUM. At that point, we'd have no
- * choice but to start failing multixact-creating operations with an error.
- *
- * To prevent that, if more than a threshold portion of the members space is
- * used, we effectively reduce autovacuum_multixact_freeze_max_age and
- * to a value just less than the number of multixacts in use. We hope that
- * this will quickly trigger autovacuuming on the table or tables with the
- * oldest relminmxid, thus allowing datminmxid values to advance and removing
- * some members.
- *
- * As the fraction of the member space currently in use grows, we become
- * more aggressive in clamping this value. That not only causes autovacuum
- * to ramp up, but also makes any manual vacuums the user issues more
- * aggressive. This happens because vacuum_get_cutoffs() will clamp the
- * freeze table and the minimum freeze age cutoffs based on the effective
- * autovacuum_multixact_freeze_max_age this function returns. In the worst
- * case, we'll claim the freeze_max_age to zero, and every vacuum of any
- * table will freeze every multixact.
- */
-int
-MultiXactMemberFreezeThreshold(void)
-{
- MultiXactOffset members;
- uint32 multixacts;
- uint32 victim_multixacts;
- double fraction;
- int result;
- MultiXactId oldestMultiXactId;
- MultiXactOffset oldestOffset;
-
- /* If we can't determine member space utilization, assume the worst. */
- if (!GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
- return 0;
-
- /* If member space utilization is low, no special action is required. */
- if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
- return autovacuum_multixact_freeze_max_age;
-
- /*
- * Compute a target for relminmxid advancement. The number of multixacts
- * we try to eliminate from the system is based on how far we are past
- * MULTIXACT_MEMBER_SAFE_THRESHOLD.
- */
- fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
- (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
- victim_multixacts = multixacts * fraction;
-
- /* fraction could be > 1.0, but lowest possible freeze age is zero */
- if (victim_multixacts > multixacts)
- return 0;
- result = multixacts - victim_multixacts;
-
- /*
- * Clamp to autovacuum_multixact_freeze_max_age, so that we never make
- * autovacuum less aggressive than it would otherwise be.
- */
- return Min(result, autovacuum_multixact_freeze_max_age);
-}
-
typedef struct mxtruncinfo
{
int64 earliestExistingPage;
@@ -3159,7 +2945,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
elog(DEBUG1, "performing multixact truncation: "
"offsets [%u, %u), offsets segments [%" PRIx64 ", %" PRIx64 "), "
- "members [%u, %u), members segments [%" PRIx64 ", %" PRIx64 ")",
+ "members [%" PRIu64 ", %" PRIu64 "), members segments [%" PRIx64 ", %" PRIx64 ")",
oldestMulti, newOldestMulti,
MultiXactIdToOffsetSegment(oldestMulti),
MultiXactIdToOffsetSegment(newOldestMulti),
@@ -3290,7 +3076,7 @@ MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
static bool
MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2)
{
- int32 diff = (int32) (offset1 - offset2);
+ int64 diff = (int64) (offset1 - offset2);
return (diff < 0);
}
@@ -3387,7 +3173,7 @@ multixact_redo(XLogReaderState *record)
elog(DEBUG1, "replaying multixact truncation: "
"offsets [%u, %u), offsets segments [%" PRIx64 ", %" PRIx64 "), "
- "members [%u, %u), members segments [%" PRIx64 ", %" PRIx64 ")",
+ "members [%" PRIu64 ", %" PRIu64 "), members segments [%" PRIx64 ", %" PRIx64 ")",
xlrec.startTruncOff, xlrec.endTruncOff,
MultiXactIdToOffsetSegment(xlrec.startTruncOff),
MultiXactIdToOffsetSegment(xlrec.endTruncOff),
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 3e3c4da01a2..3b2b0a522cb 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -885,7 +885,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
U64FromFullTransactionId(checkPoint.nextXid),
checkPoint.nextOid)));
ereport(DEBUG1,
- (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %u",
+ (errmsg_internal("next MultiXactId: %u; next MultiXactOffset: %" PRIu64,
checkPoint.nextMulti, checkPoint.nextMultiOffset)));
ereport(DEBUG1,
(errmsg_internal("oldest unfrozen transaction ID: %u, in database %u",
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ed03e3bd50d..259ef60bd31 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1147,7 +1147,7 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams params,
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Almost ready to set freeze output parameters; check if OldestXmin or
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 5084af7dfb6..26385470c19 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1151,7 +1151,7 @@ do_start_worker(void)
/* Also determine the oldest datminmxid we will consider. */
recentMulti = ReadNextMultiXactId();
- multiForceLimit = recentMulti - MultiXactMemberFreezeThreshold();
+ multiForceLimit = recentMulti - autovacuum_multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -1939,7 +1939,7 @@ do_autovacuum(void)
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
* short of multixact member space.
*/
- effective_multixact_freeze_max_age = MultiXactMemberFreezeThreshold();
+ effective_multixact_freeze_max_age = autovacuum_multixact_freeze_max_age;
/*
* Find the pg_database entry and select the default freeze ages. We use
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 10de058ce91..5295108ade3 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -264,7 +264,7 @@ main(int argc, char *argv[])
ControlFile->checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile->checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
+ printf(_("Latest checkpoint's NextMultiOffset: %" PRIu64 "\n"),
ControlFile->checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile->checkPointCopy.oldestXid);
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index a89d72fc5cf..4e5eeced89d 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -267,7 +267,7 @@ main(int argc, char *argv[])
case 'O':
errno = 0;
- set_mxoff = strtoul(optarg, &endptr, 0);
+ set_mxoff = strtou64(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0' || errno != 0)
{
pg_log_error("invalid argument for option %s", "-O");
@@ -743,7 +743,7 @@ PrintControlValues(bool guessed)
ControlFile.checkPointCopy.nextOid);
printf(_("Latest checkpoint's NextMultiXactId: %u\n"),
ControlFile.checkPointCopy.nextMulti);
- printf(_("Latest checkpoint's NextMultiOffset: %u\n"),
+ printf(_("Latest checkpoint's NextMultiOffset: %" PRIu64 "\n"),
ControlFile.checkPointCopy.nextMultiOffset);
printf(_("Latest checkpoint's oldestXID: %u\n"),
ControlFile.checkPointCopy.oldestXid);
@@ -817,7 +817,7 @@ PrintNewControlValues(void)
if (set_mxoff != -1)
{
- printf(_("NextMultiOffset: %u\n"),
+ printf(_("NextMultiOffset: %" PRIu64 "\n"),
ControlFile.checkPointCopy.nextMultiOffset);
}
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index d6bbbd0ceda..cc89e0764ae 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -213,7 +213,7 @@ push @cmd,
sprintf("%d,%d", hex($files[0]) == 0 ? 3 : hex($files[0]), hex($files[-1]));
@files = get_slru_files('pg_multixact/offsets');
-$mult = 32 * $blcksz / 4;
+$mult = 32 * $blcksz / 8;
# --multixact-ids argument is "new,old"
push @cmd,
'--multixact-ids' => sprintf("%d,%d",
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 82e4bb90dd5..16a07723088 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -28,7 +28,7 @@
#define MultiXactIdIsValid(multi) ((multi) != InvalidMultiXactId)
-#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
+#define MaxMultiXactOffset UINT64CONST(0xFFFFFFFFFFFFFFFF)
/*
* Possible multixact lock modes ("status"). The first four modes are for
@@ -147,7 +147,6 @@ extern void MultiXactSetNextMXact(MultiXactId nextMulti,
extern void MultiXactAdvanceNextMXact(MultiXactId minMulti,
MultiXactOffset minMultiOffset);
extern void MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB);
-extern int MultiXactMemberFreezeThreshold(void);
extern void multixact_twophase_recover(FullTransactionId fxid, uint16 info,
void *recdata, uint32 len);
diff --git a/src/include/c.h b/src/include/c.h
index 757dfff4782..bc92a6f4565 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -670,7 +670,7 @@ typedef uint32 SubTransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
-typedef uint32 MultiXactOffset;
+typedef uint64 MultiXactOffset;
typedef uint32 CommandId;
--
2.43.0
[text/plain] v20-0004-TEST-bump-catversion.patch.txt (775B, 6-v20-0004-TEST-bump-catversion.patch.txt)
download | inline diff:
From 9ae48f1a66fcb2f1651a168c481214f1f9b6e201 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Fri, 24 Oct 2025 11:47:50 +0300
Subject: [PATCH v20 4/5] TEST: bump catversion
To avoid constant CF-bot complains, make catversion bump in a separate
commit.
NOTE: keep it in sync with MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
---
src/include/catalog/catversion.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 18e95179ab6..6a13fa3cdb0 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202510281
+#define CATALOG_VERSION_NO 999999999
#endif
--
2.43.0
[application/octet-stream] v20-0002-Add-pg_upgarde-for-64-bit-multixact-offsets.patch (34.2K, 7-v20-0002-Add-pg_upgarde-for-64-bit-multixact-offsets.patch)
download | inline diff:
From 4076702a39d5ab0c0cd5af860fb47d2b0742c7ee Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Fri, 24 Oct 2025 10:58:37 +0300
Subject: [PATCH v20 2/5] Add pg_upgarde for 64 bit multixact offsets
Author: Maxim Orlov <[email protected]>
Author: Heikki Linnakangas <[email protected]>
---
src/backend/access/transam/multixact.c | 35 +--
src/bin/pg_upgrade/Makefile | 3 +
src/bin/pg_upgrade/meson.build | 3 +
src/bin/pg_upgrade/multixact_new.c | 227 +++++++++++++++++++
src/bin/pg_upgrade/multixact_new.h | 31 +++
src/bin/pg_upgrade/multixact_old.c | 296 +++++++++++++++++++++++++
src/bin/pg_upgrade/multixact_old.h | 31 +++
src/bin/pg_upgrade/pg_upgrade.c | 108 ++++++++-
src/bin/pg_upgrade/pg_upgrade.h | 5 +
src/bin/pg_upgrade/slru_io.c | 240 ++++++++++++++++++++
src/bin/pg_upgrade/slru_io.h | 30 +++
11 files changed, 977 insertions(+), 32 deletions(-)
create mode 100644 src/bin/pg_upgrade/multixact_new.c
create mode 100644 src/bin/pg_upgrade/multixact_new.h
create mode 100644 src/bin/pg_upgrade/multixact_old.c
create mode 100644 src/bin/pg_upgrade/multixact_old.h
create mode 100644 src/bin/pg_upgrade/slru_io.c
create mode 100644 src/bin/pg_upgrade/slru_io.h
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 93a1e4cfd2a..5a13596cd86 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1231,7 +1231,6 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
int slotno;
MultiXactOffset offset;
int length;
- int truelength;
MultiXactId oldestMXact;
MultiXactId nextMXact;
MultiXactId tmpMXact;
@@ -1330,15 +1329,6 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
* we have just for this; the process in charge will signal the CV as soon
* as it has finished writing the multixact offset.
*
- * 3. Because GetNewMultiXactId increments offset zero to offset one to
- * handle case #2, there is an ambiguity near the point of offset
- * wraparound. If we see next multixact's offset is one, is that our
- * multixact's actual endpoint, or did it end at zero with a subsequent
- * increment? We handle this using the knowledge that if the zero'th
- * member slot wasn't filled, it'll contain zero, and zero isn't a valid
- * transaction ID so it can't be a multixact member. Therefore, if we
- * read a zero from the members array, just ignore it.
- *
* This is all pretty messy, but the mess occurs only in infrequent corner
* cases, so it seems better than holding the MultiXactGenLock for a long
* time on every multixact creation.
@@ -1422,6 +1412,9 @@ retry:
LWLockRelease(lock);
lock = NULL;
+ /* A multixid with zero members should not happen */
+ Assert(length > 0);
+
/*
* If we slept above, clean up state; it's no longer needed.
*/
@@ -1430,7 +1423,6 @@ retry:
ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
- truelength = 0;
prev_pageno = -1;
for (int i = 0; i < length; i++, offset++)
{
@@ -1468,36 +1460,27 @@ retry:
xactptr = (TransactionId *)
(MultiXactMemberCtl->shared->page_buffer[slotno] + memberoff);
- if (!TransactionIdIsValid(*xactptr))
- {
- /* Corner case 3: we must be looking at unused slot zero */
- Assert(offset == 0);
- continue;
- }
+ Assert(TransactionIdIsValid(*xactptr));
flagsoff = MXOffsetToFlagsOffset(offset);
bshift = MXOffsetToFlagsBitShift(offset);
flagsptr = (uint32 *) (MultiXactMemberCtl->shared->page_buffer[slotno] + flagsoff);
- ptr[truelength].xid = *xactptr;
- ptr[truelength].status = (*flagsptr >> bshift) & MXACT_MEMBER_XACT_BITMASK;
- truelength++;
+ ptr[i].xid = *xactptr;
+ ptr[i].status = (*flagsptr >> bshift) & MXACT_MEMBER_XACT_BITMASK;
}
LWLockRelease(lock);
- /* A multixid with zero members should not happen */
- Assert(truelength > 0);
-
/*
* Copy the result into the local cache.
*/
- mXactCachePut(multi, truelength, ptr);
+ mXactCachePut(multi, length, ptr);
debug_elog3(DEBUG2, "GetMembers: no cache for %s",
- mxid_to_string(multi, truelength, ptr));
+ mxid_to_string(multi, length, ptr));
*members = ptr;
- return truelength;
+ return length;
}
/*
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 69fcf593cae..42995d53b0b 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -18,11 +18,14 @@ OBJS = \
file.o \
function.o \
info.o \
+ multixact_new.o \
+ multixact_old.o \
option.o \
parallel.o \
pg_upgrade.o \
relfilenumber.o \
server.o \
+ slru_io.o \
tablespace.o \
task.o \
util.o \
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index ac992f0d14b..3e46c4512cf 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -8,11 +8,14 @@ pg_upgrade_sources = files(
'file.c',
'function.c',
'info.c',
+ 'multixact_new.c',
+ 'multixact_old.c',
'option.c',
'parallel.c',
'pg_upgrade.c',
'relfilenumber.c',
'server.c',
+ 'slru_io.c',
'tablespace.c',
'task.c',
'util.c',
diff --git a/src/bin/pg_upgrade/multixact_new.c b/src/bin/pg_upgrade/multixact_new.c
new file mode 100644
index 00000000000..d43442fb9a7
--- /dev/null
+++ b/src/bin/pg_upgrade/multixact_new.c
@@ -0,0 +1,227 @@
+/*
+ * multixact_new.c
+ *
+ * Rewrite pre-v19 multixacts to new format with 64-bit MultiXactOffsets
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/multixact_new.c
+ */
+
+#include "multixact_new.h"
+
+/*
+ * NOTE: Below are a bunch of definitions and simple inline functions that are
+ * copy-pasted from multixact.c
+ */
+
+/* We need four bytes per offset, 8 bytes for the base */
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
+
+static inline int64
+MultiXactIdToOffsetPage(MultiXactId multi)
+{
+ return multi / MULTIXACT_OFFSETS_PER_PAGE;
+}
+
+static inline int
+MultiXactIdToOffsetEntry(MultiXactId multi)
+{
+ return multi % MULTIXACT_OFFSETS_PER_PAGE;
+}
+
+/* We need eight bits per xact, so one xact fits in a byte */
+#define MXACT_MEMBER_BITS_PER_XACT 8
+#define MXACT_MEMBER_FLAGS_PER_BYTE 1
+#define MXACT_MEMBER_XACT_BITMASK ((1 << MXACT_MEMBER_BITS_PER_XACT) - 1)
+
+/* how many full bytes of flags are there in a group? */
+#define MULTIXACT_FLAGBYTES_PER_GROUP 4
+#define MULTIXACT_MEMBERS_PER_MEMBERGROUP \
+ (MULTIXACT_FLAGBYTES_PER_GROUP * MXACT_MEMBER_FLAGS_PER_BYTE)
+/* size in bytes of a complete group */
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (sizeof(TransactionId) * MULTIXACT_MEMBERS_PER_MEMBERGROUP + MULTIXACT_FLAGBYTES_PER_GROUP)
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERGROUPS_PER_PAGE * MULTIXACT_MEMBERS_PER_MEMBERGROUP)
+
+/*
+ * Because the number of items per page is not a divisor of the last item
+ * number (member 0xFFFFFFFF), the last segment does not use the maximum number
+ * of pages, and moreover the last used page therein does not use the same
+ * number of items as previous pages. (Another way to say it is that the
+ * 0xFFFFFFFF member is somewhere in the middle of the last page, so the page
+ * has some empty space after that item.)
+ *
+ * This constant is the number of members in the last page of the last segment.
+ */
+#define MAX_MEMBERS_IN_LAST_MEMBERS_PAGE \
+ ((uint32) ((0xFFFFFFFF % MULTIXACT_MEMBERS_PER_PAGE) + 1))
+
+/* page in which a member is to be found */
+static inline int64
+MXOffsetToMemberPage(MultiXactOffset offset)
+{
+ return offset / MULTIXACT_MEMBERS_PER_PAGE;
+}
+
+/* Location (byte offset within page) of flag word for a given member */
+static inline int
+MXOffsetToFlagsOffset(MultiXactOffset offset)
+{
+ MultiXactOffset group = offset / MULTIXACT_MEMBERS_PER_MEMBERGROUP;
+ int grouponpg = group % MULTIXACT_MEMBERGROUPS_PER_PAGE;
+ int byteoff = grouponpg * MULTIXACT_MEMBERGROUP_SIZE;
+
+ return byteoff;
+}
+
+static inline int
+MXOffsetToFlagsBitShift(MultiXactOffset offset)
+{
+ int member_in_group = offset % MULTIXACT_MEMBERS_PER_MEMBERGROUP;
+ int bshift = member_in_group * MXACT_MEMBER_BITS_PER_XACT;
+
+ return bshift;
+}
+
+/* Location (byte offset within page) of TransactionId of given member */
+static inline int
+MXOffsetToMemberOffset(MultiXactOffset offset)
+{
+ int member_in_group = offset % MULTIXACT_MEMBERS_PER_MEMBERGROUP;
+
+ return MXOffsetToFlagsOffset(offset) +
+ MULTIXACT_FLAGBYTES_PER_GROUP +
+ member_in_group * sizeof(TransactionId);
+}
+
+static inline void
+MXOffsetWrite(char *buf, int entryno, MultiXactOffset offset)
+{
+ MultiXactOffset *offptr = (MultiXactOffset *) buf;
+
+ offptr[entryno] = offset;
+}
+
+MultiXactWriter *
+AllocMultiXactWrite(char *pgdata, MultiXactId firstMulti,
+ MultiXactOffset firstOffset)
+{
+ MultiXactWriter *state = pg_malloc(sizeof(*state));
+ char dir[MAXPGPATH] = {0};
+
+ state->nextMXact = firstMulti;
+ state->nextOffset = firstOffset;
+
+ pg_sprintf(dir, "%s/pg_multixact/offsets", pgdata);
+ state->offset = AllocSlruWrite(dir, false);
+
+ pg_sprintf(dir, "%s/pg_multixact/members", pgdata);
+ state->members = AllocSlruWrite(dir, true /* use long segment names */);
+
+ return state;
+}
+
+/*
+ * Simplified copy of the corresponding server function
+ */
+MultiXactId
+GetNewMultiXactId(MultiXactWriter *state, int nmembers, MultiXactOffset *offset)
+{
+ MultiXactId result;
+
+ /* Handle wraparound of the nextMXact counter */
+ if (state->nextMXact < FirstMultiXactId)
+ state->nextMXact = FirstMultiXactId;
+
+ /* Assign the MXID */
+ result = state->nextMXact;
+
+ /* Reserve the members space, similarly to above. */
+ *offset = state->nextOffset;
+
+ /*
+ * Advance counters. As in GetNewTransactionId(), this must not happen
+ * until after file extension has succeeded!
+ *
+ * We don't care about MultiXactId wraparound here; it will be handled by
+ * the next iteration. But note that nextMXact may be InvalidMultiXactId
+ * or the first value on a segment-beginning page after this routine
+ * exits, so anyone else looking at the variable must be prepared to deal
+ * with either case. Similarly, nextOffset may be zero, but we won't use
+ * that as the actual start offset of the next multixact.
+ */
+ (state->nextMXact)++;
+
+ state->nextOffset += nmembers;
+
+ return result;
+}
+
+/*
+ * Write a new multixact with members.
+ *
+ * Simplified version of the correspoding server function, hence the name.
+ */
+void
+RecordNewMultiXact(MultiXactWriter *state, MultiXactOffset offset,
+ MultiXactId multi, int nmembers, MultiXactMember *members)
+{
+ int64 pageno;
+ int64 prev_pageno;
+ int entryno,
+ i;
+ char *buf;
+
+ pageno = MultiXactIdToOffsetPage(multi);
+ entryno = MultiXactIdToOffsetEntry(multi);
+
+ buf = SlruWriteSwitchPage(state->offset, pageno);
+ MXOffsetWrite(buf, entryno, offset);
+
+ prev_pageno = -1;
+
+ for (i = 0; i < nmembers; i++, offset++)
+ {
+ TransactionId *memberptr;
+ uint32 *flagsptr;
+ uint32 flagsval;
+ int bshift;
+ int flagsoff;
+ int memberoff;
+
+ Assert(members[i].status <= MultiXactStatusUpdate);
+
+ pageno = MXOffsetToMemberPage(offset);
+ memberoff = MXOffsetToMemberOffset(offset);
+ flagsoff = MXOffsetToFlagsOffset(offset);
+ bshift = MXOffsetToFlagsBitShift(offset);
+
+ if (pageno != prev_pageno)
+ {
+ buf = SlruWriteSwitchPage(state->members, pageno);
+ prev_pageno = pageno;
+ }
+
+ memberptr = (TransactionId *) (buf + memberoff);
+
+ *memberptr = members[i].xid;
+
+ flagsptr = (uint32 *) (buf + flagsoff);
+
+ flagsval = *flagsptr;
+ flagsval &= ~(((1 << MXACT_MEMBER_BITS_PER_XACT) - 1) << bshift);
+ flagsval |= (members[i].status << bshift);
+ *flagsptr = flagsval;
+ }
+}
+
+void
+FreeMultiXactWrite(MultiXactWriter *state)
+{
+ FreeSlruWrite(state->offset);
+ FreeSlruWrite(state->members);
+
+ pfree(state);
+}
diff --git a/src/bin/pg_upgrade/multixact_new.h b/src/bin/pg_upgrade/multixact_new.h
new file mode 100644
index 00000000000..33d5d1b8222
--- /dev/null
+++ b/src/bin/pg_upgrade/multixact_new.h
@@ -0,0 +1,31 @@
+/*
+ * multixact_new.h
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/multixact_new.h
+ */
+
+#include "postgres_fe.h"
+
+#include "access/multixact.h"
+
+#include "slru_io.h"
+
+typedef struct MultiXactWriter
+{
+ MultiXactId nextMXact;
+ MultiXactOffset nextOffset;
+
+ SlruSegState *offset;
+ SlruSegState *members;
+} MultiXactWriter;
+
+extern MultiXactWriter *AllocMultiXactWrite(char *pgdata,
+ MultiXactId firstMulti,
+ MultiXactOffset firstOffset);
+extern MultiXactId GetNewMultiXactId(MultiXactWriter *state, int nmembers,
+ MultiXactOffset *offset);
+extern void RecordNewMultiXact(MultiXactWriter *state, MultiXactOffset offset,
+ MultiXactId multi, int nmembers,
+ MultiXactMember *members);
+extern void FreeMultiXactWrite(MultiXactWriter *writer);
diff --git a/src/bin/pg_upgrade/multixact_old.c b/src/bin/pg_upgrade/multixact_old.c
new file mode 100644
index 00000000000..6cc384d2cf2
--- /dev/null
+++ b/src/bin/pg_upgrade/multixact_old.c
@@ -0,0 +1,296 @@
+/*
+ * multixact_old.c
+ *
+ * Rewrite pre-v19 multixacts to new format with 64-bit MultiXactOffsets
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/multixact_old.c
+ */
+
+#include "multixact_old.h"
+
+#include "pg_upgrade.h"
+
+/*
+ * NOTE: below are a bunch of definitions and simple sttaic inline functions
+ * that are copy-pasted from multixact.c from version 18. The only difference
+ * is that we use the OldMultiXactOffset type equal to uint32 instead of
+ * MultiXactOffset which became uint64.
+ */
+
+/* We need four bytes per offset and 8 bytes per base for each page. */
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(OldMultiXactOffset))
+
+static inline int64
+MultiXactIdToOffsetPage(MultiXactId multi)
+{
+ return multi / MULTIXACT_OFFSETS_PER_PAGE;
+}
+
+static inline int
+MultiXactIdToOffsetEntry(MultiXactId multi)
+{
+ return multi % MULTIXACT_OFFSETS_PER_PAGE;
+}
+
+/*
+ * The situation for members is a bit more complex: we store one byte of
+ * additional flag bits for each TransactionId. To do this without getting
+ * into alignment issues, we store four bytes of flags, and then the
+ * corresponding 4 Xids. Each such 5-word (20-byte) set we call a "group", and
+ * are stored as a whole in pages. Thus, with 8kB BLCKSZ, we keep 409 groups
+ * per page. This wastes 12 bytes per page, but that's OK -- simplicity (and
+ * performance) trumps space efficiency here.
+ *
+ * Note that the "offset" macros work with byte offset, not array indexes, so
+ * arithmetic must be done using "char *" pointers.
+ */
+/* We need eight bits per xact, so one xact fits in a byte */
+#define MXACT_MEMBER_BITS_PER_XACT 8
+#define MXACT_MEMBER_FLAGS_PER_BYTE 1
+#define MXACT_MEMBER_XACT_BITMASK ((1 << MXACT_MEMBER_BITS_PER_XACT) - 1)
+
+/* how many full bytes of flags are there in a group? */
+#define MULTIXACT_FLAGBYTES_PER_GROUP 4
+#define MULTIXACT_MEMBERS_PER_MEMBERGROUP \
+ (MULTIXACT_FLAGBYTES_PER_GROUP * MXACT_MEMBER_FLAGS_PER_BYTE)
+/* size in bytes of a complete group */
+#define MULTIXACT_MEMBERGROUP_SIZE \
+ (sizeof(TransactionId) * MULTIXACT_MEMBERS_PER_MEMBERGROUP + MULTIXACT_FLAGBYTES_PER_GROUP)
+#define MULTIXACT_MEMBERGROUPS_PER_PAGE (BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
+#define MULTIXACT_MEMBERS_PER_PAGE \
+ (MULTIXACT_MEMBERGROUPS_PER_PAGE * MULTIXACT_MEMBERS_PER_MEMBERGROUP)
+
+/* page in which a member is to be found */
+static inline int64
+MXOffsetToMemberPage(OldMultiXactOffset offset)
+{
+ return offset / MULTIXACT_MEMBERS_PER_PAGE;
+}
+
+/* Location (byte offset within page) of flag word for a given member */
+static inline int
+MXOffsetToFlagsOffset(MultiXactOffset offset)
+{
+ OldMultiXactOffset group = offset / MULTIXACT_MEMBERS_PER_MEMBERGROUP;
+ int grouponpg = group % MULTIXACT_MEMBERGROUPS_PER_PAGE;
+ int byteoff = grouponpg * MULTIXACT_MEMBERGROUP_SIZE;
+
+ return byteoff;
+}
+
+/* Location (byte offset within page) of TransactionId of given member */
+static inline int
+MXOffsetToMemberOffset(OldMultiXactOffset offset)
+{
+ int member_in_group = offset % MULTIXACT_MEMBERS_PER_MEMBERGROUP;
+
+ return MXOffsetToFlagsOffset(offset) +
+ MULTIXACT_FLAGBYTES_PER_GROUP +
+ member_in_group * sizeof(TransactionId);
+}
+
+static inline int
+MXOffsetToFlagsBitShift(OldMultiXactOffset offset)
+{
+ int member_in_group = offset % MULTIXACT_MEMBERS_PER_MEMBERGROUP;
+ int bshift = member_in_group * MXACT_MEMBER_BITS_PER_XACT;
+
+ return bshift;
+}
+
+/*
+ * Construct reader of old multixacts.
+ *
+ * Returns the malloced memory used by the all other calls in this module.
+ */
+OldMultiXactReader *
+AllocOldMultiXactRead(char *pgdata, MultiXactId nextMulti,
+ OldMultiXactOffset nextOffset)
+{
+ OldMultiXactReader *state = state = pg_malloc(sizeof(*state));
+ char dir[MAXPGPATH] = {0};
+
+ state->nextMXact = nextMulti;
+ state->nextOffset = nextOffset;
+
+ pg_sprintf(dir, "%s/pg_multixact/offsets", pgdata);
+ state->offset = AllocSlruRead(dir);
+
+ pg_sprintf(dir, "%s/pg_multixact/members", pgdata);
+ state->members = AllocSlruRead(dir);
+
+ return state;
+}
+
+/*
+ * This is a simplified version of the GetMultiXactIdMembers() server function.
+ *
+ * - Only return the updating member, if any. Upgrade only cares about the
+ * updaters. If there is no updating member, return the first locking-only
+ * member. We don't have any way to represent "no members", but we also don't
+ * need to preserve all the locking members.
+ *
+ * - We don't need to worry about locking and some corner cases because there's
+ * no concurrent activity.
+ */
+void
+GetOldMultiXactIdSingleMember(OldMultiXactReader *state, MultiXactId multi,
+ TransactionId *result, MultiXactStatus *status)
+{
+ MultiXactId nextMXact,
+ nextOffset,
+ tmpMXact;
+ int64 pageno,
+ prev_pageno;
+ int entryno,
+ length;
+ char *buf;
+ OldMultiXactOffset *offptr,
+ offset;
+ TransactionId result_xid = InvalidTransactionId;
+ bool result_isupdate = false;
+
+ nextMXact = state->nextMXact;
+ nextOffset = state->nextOffset;
+
+ /*
+ * See GetMultiXactIdMembers in multixact.c
+ *
+ * Find out the offset at which we need to start reading MultiXactMembers
+ * and the number of members in the multixact. We determine the latter as
+ * the difference between this multixact's starting offset and the next
+ * one's. However, there are some corner cases to worry about:
+ *
+ * 1. This multixact may be the latest one created, in which case there is
+ * no next one to look at. In this case the nextOffset value we just
+ * saved is the correct endpoint.
+ *
+ * 2. The next multixact may still be in process of being filled in...
+ * This cannot happen during upgrade.
+ *
+ * 3. Because GetNewMultiXactId increments offset zero to offset one to
+ * handle case #2, there is an ambiguity near the point of offset
+ * wraparound. If we see next multixact's offset is one, is that our
+ * multixact's actual endpoint, or did it end at zero with a subsequent
+ * increment? We handle this using the knowledge that if the zero'th
+ * member slot wasn't filled, it'll contain zero, and zero isn't a valid
+ * transaction ID so it can't be a multixact member. Therefore, if we
+ * read a zero from the members array, just ignore it.
+ */
+
+ pageno = MultiXactIdToOffsetPage(multi);
+ entryno = MultiXactIdToOffsetEntry(multi);
+
+ buf = SlruReadSwitchPage(state->offset, pageno);
+ offptr = (OldMultiXactOffset *) buf;
+ offptr += entryno;
+ offset = *offptr;
+
+ Assert(offset != 0);
+
+ /*
+ * Use the same increment rule as GetNewMultiXactId(), that is, don't
+ * handle wraparound explicitly until needed.
+ */
+ tmpMXact = multi + 1;
+
+ if (nextMXact == tmpMXact)
+ {
+ /* Corner case 1: there is no next multixact */
+ length = nextOffset - offset;
+ }
+ else
+ {
+ OldMultiXactOffset nextMXOffset;
+
+ /* handle wraparound if needed */
+ if (tmpMXact < FirstMultiXactId)
+ tmpMXact = FirstMultiXactId;
+
+ prev_pageno = pageno;
+
+ pageno = MultiXactIdToOffsetPage(tmpMXact);
+ entryno = MultiXactIdToOffsetEntry(tmpMXact);
+
+ if (pageno != prev_pageno)
+ buf = SlruReadSwitchPage(state->offset, pageno);
+
+ offptr = (OldMultiXactOffset *) buf;
+ offptr += entryno;
+ nextMXOffset = *offptr;
+
+ /*
+ * Corner case 2: next multixact is still being filled in, this must
+ * not happen during upgrade.
+ */
+ Assert(nextMXOffset != 0);
+
+ length = nextMXOffset - offset;
+ }
+
+ prev_pageno = -1;
+ for (int i = 0; i < length; i++, offset++)
+ {
+ TransactionId *xactptr;
+ uint32 *flagsptr;
+ int flagsoff;
+ int bshift;
+ int memberoff;
+ MultiXactStatus st;
+
+ pageno = MXOffsetToMemberPage(offset);
+ memberoff = MXOffsetToMemberOffset(offset);
+
+ if (pageno != prev_pageno)
+ {
+ buf = SlruReadSwitchPage(state->members, pageno);
+ prev_pageno = pageno;
+ }
+
+ xactptr = (TransactionId *) (buf + memberoff);
+ if (!TransactionIdIsValid(*xactptr))
+ {
+ /* Corner case 3: we must be looking at unused slot zero */
+ Assert(offset == 0);
+ continue;
+ }
+
+ flagsoff = MXOffsetToFlagsOffset(offset);
+ bshift = MXOffsetToFlagsBitShift(offset);
+ flagsptr = (uint32 *) (buf + flagsoff);
+
+ st = (*flagsptr >> bshift) & MXACT_MEMBER_XACT_BITMASK;
+
+ /* Verify that there is a single update Xid among the given members. */
+ if (ISUPDATE_from_mxstatus(st))
+ {
+ if (result_isupdate)
+ pg_fatal("multixact %u has more than one updating member",
+ multi);
+ result_xid = *xactptr;
+ result_isupdate = true;
+ }
+ else if (!TransactionIdIsValid(result_xid))
+ result_xid = *xactptr;
+ }
+
+ /* A multixid with zero members should not happen */
+ Assert(TransactionIdIsValid(result_xid));
+
+ *result = result_xid;
+ *status = result_isupdate ? MultiXactStatusUpdate :
+ MultiXactStatusForKeyShare;
+}
+
+/*
+ * Frees the malloced reader.
+ */
+void
+FreeOldMultiXactReader(OldMultiXactReader *state)
+{
+ FreeSlruRead(state->offset);
+ FreeSlruRead(state->members);
+
+ pfree(state);
+}
diff --git a/src/bin/pg_upgrade/multixact_old.h b/src/bin/pg_upgrade/multixact_old.h
new file mode 100644
index 00000000000..8d4659ba6a0
--- /dev/null
+++ b/src/bin/pg_upgrade/multixact_old.h
@@ -0,0 +1,31 @@
+/*
+ * multixact_old.h
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/multixact_old.h
+ */
+
+#include "postgres_fe.h"
+
+#include "access/multixact.h"
+#include "slru_io.h"
+
+typedef uint32 OldMultiXactOffset;
+
+typedef struct OldMultiXactReader
+{
+ MultiXactId nextMXact;
+ OldMultiXactOffset nextOffset;
+
+ SlruSegState *offset;
+ SlruSegState *members;
+} OldMultiXactReader;
+
+extern OldMultiXactReader *AllocOldMultiXactRead(char *pgdata,
+ MultiXactId nextMulti,
+ OldMultiXactOffset nextOffset);
+extern void GetOldMultiXactIdSingleMember(OldMultiXactReader *state,
+ MultiXactId multi,
+ TransactionId *result,
+ MultiXactStatus *status);
+extern void FreeOldMultiXactReader(OldMultiXactReader *reader);
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 490e98fa26f..5432c03a2b0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -49,6 +49,8 @@
#include "common/restricted_token.h"
#include "fe_utils/string_utils.h"
#include "pg_upgrade.h"
+#include "multixact_old.h"
+#include "multixact_new.h"
/*
* Maximum number of pg_restore actions (TOC entries) to process within one
@@ -769,6 +771,82 @@ copy_subdir_files(const char *old_subdir, const char *new_subdir)
check_ok();
}
+/*
+ * Convert pg_multixact/offset and /members to new format with 64-bit offsets.
+ */
+static void
+convert_multixacts(MultiXactId *new_nxtmulti, MultiXactOffset *new_nxtmxoff)
+{
+ MultiXactId oldest_multi,
+ next_multi;
+ OldMultiXactReader *old_reader;
+ MultiXactWriter *new_writer;
+
+ old_reader = AllocOldMultiXactRead(old_cluster.pgdata,
+ old_cluster.controldata.chkpnt_nxtmulti,
+ old_cluster.controldata.chkpnt_nxtmxoff);
+ new_writer = AllocMultiXactWrite(new_cluster.pgdata,
+ old_cluster.controldata.chkpnt_oldstMulti,
+ 1 /* see below */);
+
+ oldest_multi = old_cluster.controldata.chkpnt_oldstMulti;
+ next_multi = old_cluster.controldata.chkpnt_nxtmulti;
+
+ /* handle wraparound */
+ if (next_multi < FirstMultiXactId)
+ next_multi = FirstMultiXactId;
+
+ /*
+ * Read multixids from old files one by one, and write them back in the new
+ * format.
+ *
+ * The locking-only XIDs that may be part of multi-xids don't matter after
+ * upgrade, as there can be no transactions running across upgrade. So as
+ * a little optimization, we only read one member from each multixid: the
+ * one updating one, or if there was no update, arbitrarily the first
+ * locking xid.
+ */
+ for (MultiXactId multi = oldest_multi; multi != next_multi;)
+ {
+ TransactionId xid;
+ MultiXactStatus status;
+ MultiXactMember member;
+ MultiXactId new_multi PG_USED_FOR_ASSERTS_ONLY;
+ MultiXactOffset offset;
+
+ /* Read the old multixid */
+ GetOldMultiXactIdSingleMember(old_reader, multi, &xid, &status);
+
+ /* Write it out in new format */
+ member.xid = xid;
+ member.status = status;
+ new_multi = GetNewMultiXactId(new_writer, 1, &offset);
+
+ Assert(new_multi == multi);
+
+ RecordNewMultiXact(new_writer, offset, multi, 1, &member);
+
+ multi++;
+ /* handle wraparound */
+ if (multi < FirstMultiXactId)
+ multi = FirstMultiXactId;
+ }
+
+ /*
+ * Update the nextMXact/Offset values in the control file to match what we
+ * wrote. The nextMXact should be unchanged, but because we ignored the
+ * locking XIDs members, the nextOffset will be different.
+ */
+ Assert(new_writer->nextMXact == next_multi);
+
+ *new_nxtmulti = next_multi;
+ *new_nxtmxoff = new_writer->nextOffset;
+
+ /* Release resources */
+ FreeMultiXactWrite(new_writer);
+ FreeOldMultiXactReader(old_reader);
+}
+
static void
copy_xact_xlog_xid(void)
{
@@ -816,8 +894,28 @@ copy_xact_xlog_xid(void)
if (old_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER &&
new_cluster.controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
{
- copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
- copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ MultiXactId new_nxtmulti = old_cluster.controldata.chkpnt_nxtmulti;
+ MultiXactOffset new_nxtmxoff = old_cluster.controldata.chkpnt_nxtmxoff;
+
+ /*
+ * If the old server is before the MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
+ * it must have 32-bit multixid offsets, thus it should be converted.
+ */
+ if (old_cluster.controldata.cat_ver < MULTIXACTOFFSET_FORMATCHANGE_CAT_VER &&
+ new_cluster.controldata.cat_ver >= MULTIXACTOFFSET_FORMATCHANGE_CAT_VER)
+ {
+ remove_new_subdir("pg_multixact/members", false);
+ remove_new_subdir("pg_multixact/offsets", false);
+
+ prep_status("Converting pg_multixact/offsets to 64-bit");
+ convert_multixacts(&new_nxtmulti, &new_nxtmxoff);
+ check_ok();
+ }
+ else
+ {
+ copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
+ copy_subdir_files("pg_multixact/members", "pg_multixact/members");
+ }
prep_status("Setting next multixact ID and offset for new cluster");
@@ -826,10 +924,8 @@ copy_xact_xlog_xid(void)
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
- "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
- new_cluster.bindir,
- old_cluster.controldata.chkpnt_nxtmxoff,
- old_cluster.controldata.chkpnt_nxtmulti,
+ "\"%s/pg_resetwal\" -O %" PRIu64 " -m %u,%u \"%s\"",
+ new_cluster.bindir, new_nxtmxoff, new_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);
check_ok();
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index e86336f4be9..127b2cb00fa 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -114,6 +114,11 @@ extern char *output_files[];
*/
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
+/*
+ * Swicth from 32-bit to 64-bit for multixid offsets.
+ */
+#define MULTIXACTOFFSET_FORMATCHANGE_CAT_VER 999999999
+
/*
* large object chunk size added to pg_controldata,
* commit 5f93c37805e7485488480916b4585e098d3cc883
diff --git a/src/bin/pg_upgrade/slru_io.c b/src/bin/pg_upgrade/slru_io.c
new file mode 100644
index 00000000000..4e823199303
--- /dev/null
+++ b/src/bin/pg_upgrade/slru_io.c
@@ -0,0 +1,240 @@
+/*
+ * slru_io.c
+ *
+ * Routines for reading and writing SLRU files during upgrade.
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/slru_io.c
+ */
+
+#include "postgres_fe.h"
+
+#include <fcntl.h>
+
+#include "pg_upgrade.h"
+#include "slru_io.h"
+
+#include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
+#include "port/pg_iovec.h"
+
+/*
+ * State for reading or writing an SLRU, with a one page buffer.
+ */
+typedef struct SlruSegState
+{
+ bool writing;
+ bool long_segment_names;
+
+ char *dir;
+ char *fn;
+ int fd;
+ int64 segno;
+ uint64 pageno;
+
+ PGAlignedBlock buf;
+} SlruSegState;
+
+static inline SlruSegState *
+AllocSlruSegState(char *dir)
+{
+ SlruSegState *state = pg_malloc(sizeof(*state));
+
+ state->segno = -1;
+ state->pageno = 0;
+ state->dir = pstrdup(dir);
+ state->fd = -1;
+ state->fn = NULL;
+
+ return state;
+}
+
+static inline void
+SlruFlush(SlruSegState *state)
+{
+ struct iovec iovec = {
+ .iov_base = &state->buf,
+ .iov_len = BLCKSZ,
+ };
+ off_t offset;
+
+ if (state->segno == -1)
+ return;
+
+ offset = (state->pageno % SLRU_PAGES_PER_SEGMENT) * BLCKSZ;
+
+ if (pg_pwritev_with_retry(state->fd, &iovec, 1, offset) < 0)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+}
+
+/*
+ * Create slru reader for dir.
+ *
+ * Returns the malloced memory used by the all other read calls in this module.
+ */
+SlruSegState *
+AllocSlruRead(char *dir)
+{
+ SlruSegState *state = AllocSlruSegState(dir);
+
+ state->writing = false;
+
+ return state;
+}
+
+/*
+ * Open given page for reading.
+ *
+ * Reading can be done in random order.
+ */
+char *
+SlruReadSwitchPage(SlruSegState *state, uint64 pageno)
+{
+ int64 segno;
+
+ Assert(!state->writing); /* read only mode */
+
+ if (state->segno != -1 && pageno == state->pageno)
+ return state->buf.data;
+
+ segno = pageno / SLRU_PAGES_PER_SEGMENT;
+ if (segno != state->segno)
+ {
+ if (state->segno != -1)
+ {
+ close(state->fd);
+ state->fd = -1;
+
+ pg_free(state->fn);
+ state->fn = NULL;
+ }
+
+ /* Open new segment */
+ state->fn = psprintf("%s/%04X", state->dir, (unsigned int) segno);
+ if ((state->fd = open(state->fn, O_RDONLY | PG_BINARY, 0)) < 0)
+ pg_fatal("could not open file \"%s\": %m", state->fn);
+ }
+
+ state->segno = segno;
+
+ {
+ struct iovec iovec = {
+ .iov_base = &state->buf,
+ .iov_len = BLCKSZ,
+ };
+ off_t offset = (pageno % SLRU_PAGES_PER_SEGMENT) * BLCKSZ;
+
+ if (pg_preadv(state->fd, &iovec, 1, offset) < 0)
+ pg_fatal("could not read file \"%s\": %m", state->fn);
+
+ state->pageno = pageno;
+ }
+
+ return state->buf.data;
+}
+
+/*
+ * Frees the malloced reader.
+ */
+void
+FreeSlruRead(SlruSegState *state)
+{
+ Assert(!state->writing); /* read only mode */
+
+ close(state->fd);
+ pg_free(state);
+}
+
+/*
+ * Open the given page for writing.
+ *
+ * NOTE: This uses O_EXCL when stepping to a new segment, so this assumes that
+ * each segment is written in full before moving on to next one. This
+ * limitation would be easy to lift if needed, but it fits the usage pattern of
+ * current callers.
+ */
+char *
+SlruWriteSwitchPage(SlruSegState *state, uint64 pageno)
+{
+ int64 segno = pageno / SLRU_PAGES_PER_SEGMENT;
+ off_t offset;
+
+ if (state->segno != -1 && pageno == state->pageno)
+ return state->buf.data;
+
+ segno = pageno / SLRU_PAGES_PER_SEGMENT;
+ offset = (pageno % SLRU_PAGES_PER_SEGMENT) * BLCKSZ;
+
+ SlruFlush(state);
+ memset(state->buf.data, 0, BLCKSZ);
+
+ if (segno != state->segno)
+ {
+ if (state->segno != -1)
+ {
+ close(state->fd);
+ state->fd = -1;
+
+ pg_free(state->fn);
+ state->fn = NULL;
+ }
+
+ /* Create the segment */
+ if (state->long_segment_names)
+ {
+ Assert(segno >= 0 && segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+ state->fn = psprintf("%s/%015" PRIX64, state->dir, segno);
+ }
+ else
+ {
+ Assert(segno >= 0 && segno <= INT64CONST(0xFFFFFF));
+ state->fn = psprintf("%s/%04X", state->dir, (unsigned int) segno);
+ }
+
+ if ((state->fd = open(state->fn, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
+ pg_file_create_mode)) < 0)
+ {
+ pg_fatal("could not create file \"%s\": %m", state->fn);
+ }
+
+ state->segno = segno;
+
+ if (offset > 0 && pg_pwrite_zeros(state->fd, offset, 0) < 0)
+ pg_fatal("could not write file \"%s\": %m", state->fn);
+ }
+
+ state->pageno = pageno;
+
+ return state->buf.data;
+}
+
+/*
+ * Create slru writer for dir.
+ *
+ * Returns the malloced memory used by the all other write calls in this module.
+ */
+SlruSegState *
+AllocSlruWrite(char *dir, bool long_segment_names)
+{
+ SlruSegState *state = AllocSlruSegState(dir);
+
+ state->writing = true;
+ state->long_segment_names = long_segment_names;
+
+ return state;
+}
+
+/*
+ * Frees the malloced writer.
+ */
+void
+FreeSlruWrite(SlruSegState *state)
+{
+ Assert(state->writing);
+
+ SlruFlush(state);
+
+ close(state->fd);
+ pg_free(state);
+}
diff --git a/src/bin/pg_upgrade/slru_io.h b/src/bin/pg_upgrade/slru_io.h
new file mode 100644
index 00000000000..920b8ae82e2
--- /dev/null
+++ b/src/bin/pg_upgrade/slru_io.h
@@ -0,0 +1,30 @@
+/*
+ * slru_io.h
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/slru_io.h
+ */
+
+/*
+ * Some kind of iterator associated with a particular SLRU segment. The idea is
+ * to specify the segment and page number and then move through the pages.
+ */
+
+#include "postgres_fe.h"
+
+/*
+ * See access/slru.h
+ *
+ * Copy here, since slru.h could not be included in fe code.
+ */
+#define SLRU_PAGES_PER_SEGMENT 32
+
+typedef struct SlruSegState SlruSegState;
+
+extern SlruSegState *AllocSlruRead(char *dir);
+extern char *SlruReadSwitchPage(SlruSegState *state, uint64 pageno);
+extern void FreeSlruRead(SlruSegState *state);
+
+extern SlruSegState *AllocSlruWrite(char *dir, bool long_segment_names);
+extern char *SlruWriteSwitchPage(SlruSegState *state, uint64 pageno);
+extern void FreeSlruWrite(SlruSegState *state);
--
2.43.0
view thread (79+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: POC: make mxidoff 64 bits
In-Reply-To: <CACG=ezb53xpK39y2oe8H5BAAEnrtpqAjTF_xKoFZh8z6PJY7zA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox