RE: Adding REPACK [concurrently] - Zhijie Hou (Fujitsu)

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Zhijie Hou (Fujitsu) <[email protected]>
To: Amit Kapila <[email protected]>
Cc: Alvaro Herrera <[email protected]>
Cc: Antonin Houska <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: Srinath Reddy Sadipiralla <[email protected]>
Cc: Mihail Nikalayeu <[email protected]>
Cc: Matthias van de Meent <[email protected]>
Cc: Pg Hackers <[email protected]>
Cc: Robert Treat <[email protected]>
Subject: RE: Adding REPACK [concurrently]
Date: Thu, 28 May 2026 05:18:34 +0000
Message-ID: <TY4PR01MB177181DF3B3DA853AA2298D8D94092@TY4PR01MB17718.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <CAA4eK1LFqdh4=b4t3OTpmCMs211-9dJEG8JdcV1ikN94LBum9w@mail.gmail.com>
References: <TY4PR01MB17718B44164522D0798F8E898940A2@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<[email protected]>
	<TY4PR01MB177189B910069E99897E4323D94082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CAA4eK1Jpfe+nhV_KX0qKHTdLjwLEOPFVLrvCrXxLUxWptT6x5g@mail.gmail.com>
	<CAA4eK1LFqdh4=b4t3OTpmCMs211-9dJEG8JdcV1ikN94LBum9w@mail.gmail.com>

On Thursday, May 28, 2026 11:34 AM Amit Kapila <[email protected]> wrote:
> On Wed, May 27, 2026 at 5:31 PM Amit Kapila <[email protected]>
> wrote:
> >
> > On Wed, May 27, 2026 at 1:08 AM Zhijie Hou (Fujitsu)
> > <[email protected]> wrote:
> > >
> > > 0001 remains unchanged.
> > >
> >
> > Few minor comments:
> > =================
> 
> Commit message says: "This change does not advance catalog_xmin.
> REPACK already holds a snapshot that prevents catalog dead tuple removal,
> so catalog_xmin handling can be addressed independently.".
> Isn't it equally important to advance this, otherwise, for long running REPACKs
> dead tuples will be accumulated needlessly? If so, do we have any ideas to
> avoid this?

My understanding is that dead tuple accumulation is common to all long-running
commands (including CLUSTER, VACUUM FULL, and REPACK without CONCURRENTLY). As
long as a command holds a snapshot for a long time while scanning and copying
data, the backend xmin will cause similar accumulation. So, this doesn't seem
like a new issue to me, and given that catalog_xmin only affect tuples in system
catalog which is less harmful, I thought it could be handled independently.
There was a proposal to improve this case in [1]. Sorry if I've missed something.

Attaching the v4 patch which improved the comments and commit message as
suggested.

[1] https://www.postgresql.org/message-id/125085.1775827305%40localhost

Best Regards,
Hou zj


Attachments:

  [application/octet-stream] v4-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch (3.6K, 2-v4-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch)
  download | inline diff:
From 74881e1bf03da8a4772b3bc5a24542b6dfb51042 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Fri, 10 Apr 2026 16:24:55 +0800
Subject: [PATCH v4 1/2] Allow old WAL recycling during REPACK CONCURRENTLY

During REPACK CONCURRENTLY, logical decoding can keep replication
slot.restart_lsn pinned behind the oldest running transaction, which is often
the long-lived REPACK transaction itself. As a result, old WAL segments are
retained longer than necessary.

This commit advances the replication slot each time WAL insertion crosses a
segment boundary, so obsolete WAL files can be recycled while REPACK is still
running.

This change does not advance catalog_xmin. REPACK already holds a snapshot that
prevents catalog dead tuple removal, so catalog_xmin handling can be addressed
independently.

Additionally, this commit improves LogicalConfirmReceivedLocation to compute the
oldest restart LSN whenever slot.restart_lsn is updated. Previously, this
function performed the computation only when catalog_xmin was updated, which was
less problematic because catalog_xmin typically advances in most replication
cases, but not for REPACK.
---
 src/backend/commands/repack_worker.c      | 20 +++++++++++++++++++-
 src/backend/replication/logical/logical.c |  8 +++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/repack_worker.c b/src/backend/commands/repack_worker.c
index 4f82eb46bec..56974cbc1f5 100644
--- a/src/backend/commands/repack_worker.c
+++ b/src/backend/commands/repack_worker.c
@@ -397,12 +397,30 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 
 			/*
 			 * If WAL segment boundary has been crossed, inform the decoding
-			 * system that the catalog_xmin can advance.
+			 * system that the slot can advance.
+			 *
+			 * Once REPACK begins copying data to the new table, the logical
+			 * decoding machinery prevents the slot from advancing beyond the
+			 * oldest running transaction (which is the REPACK transaction
+			 * itself). As a result, restart_lsn and catalog_xmin can no
+			 * longer advance automatically.
+			 *
+			 * To allow old WAL files to be recycled, we manually advance the
+			 * slot each time a WAL segment boundary is crossed. This is safe
+			 * because the REPACK slot is temporary and will be dropped
+			 * automatically if the REPACK command fails. There is no scenario
+			 * where this slot needs to restart decoding from an earlier
+			 * position while still alive.
+			 *
+			 * We do not advance catalog_xmin here because the REPACK
+			 * transaction anyway holds a snapshot that prevents catalog dead
+			 * tuple removal.
 			 */
 			end_lsn = ctx->reader->EndRecPtr;
 			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
 			if (segno_new != repack_current_segment)
 			{
+				LogicalIncreaseRestartDecodingForSlot(end_lsn, end_lsn);
 				LogicalConfirmReceivedLocation(end_lsn);
 				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
 					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index b969caae72e..8b8095bd5d8 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1910,8 +1910,14 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 			SpinLockRelease(&MyReplicationSlot->mutex);
 
 			ReplicationSlotsComputeRequiredXmin(false);
-			ReplicationSlotsComputeRequiredLSN();
 		}
+
+		/*
+		 * Now the new restart_lsn is safely on disk, recompute the global WAL
+		 * retention requirement.
+		 */
+		if (updated_restart)
+			ReplicationSlotsComputeRequiredLSN();
 	}
 	else
 	{
-- 
2.43.0



  [application/octet-stream] v4-0002-Add-a-test-for-repack-concurrently.patch (3.2K, 3-v4-0002-Add-a-test-for-repack-concurrently.patch)
  download | inline diff:
From 3a82ef6b68280efb1ed008cf436b5a33eed1488f Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 27 May 2026 15:58:05 +0800
Subject: [PATCH v4 2/2] Add a test for repack concurrently

---
 .../recovery/t/046_checkpoint_logical_slot.pl | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index 66761bf56c1..aa32859dd15 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -226,4 +226,78 @@ is( $standby->safe_psql(
 	"t",
 	'logical slot is not invalidated');
 
+# Verify that the REPACK slot's restart_lsn can advance while REPACK
+# CONCURRENTLY is still running, allowing WAL files to be recycled during this
+# period.
+
+# Create the table to be repacked and populate it with some data.
+$node->safe_psql(
+	'postgres',
+	q{
+CREATE TABLE repack_test(i int PRIMARY KEY, t text);
+INSERT INTO repack_test
+SELECT g, md5(g::text)
+FROM generate_series(1, 100) g;
+});
+
+# Pause the REPACK command in the middle of its execution so that the decoding
+# worker continues running, allowing us to test slot restart_lsn advancement
+# later.
+$node->safe_psql('postgres',
+	q(select injection_points_attach('repack-concurrently-before-lock','wait'))
+);
+
+my $repack = $node->background_psql('postgres');
+$repack->query_until(
+	qr/repack_started/,
+	q(
+\echo repack_started
+REPACK (CONCURRENTLY) repack_test;
+\q
+));
+
+# Wait until REPACK reaches the injection point.
+$node->wait_for_event('client backend', 'repack-concurrently-before-lock');
+
+my $restart_lsn_before = $node->safe_psql('postgres',
+	"SELECT restart_lsn FROM pg_replication_slots WHERE slot_name ~ '^repack_[0-9]+' AND slot_type = 'logical' AND temporary;");
+
+# Verify that the replication slot created by the subscription exists and has a
+# valid restart_lsn.
+ok(defined($restart_lsn_before) && $restart_lsn_before ne '',
+	'REPACK slot has restart_lsn');
+
+my $segment_before = $node->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn_before')");
+my $segment_before_path = $node->data_dir . "/pg_wal/$segment_before";
+ok(-f $segment_before_path,
+	"segment for initial restart_lsn exists: $segment_before");
+
+# Switch WAL file on the primary while REPACK is still running and then force
+# WAL removal/recycling with a checkpoint.
+$node->advance_wal(1);
+
+# Wait until the REPACK slot's restart_lsn advances
+ok( $node->poll_query_until(
+	'postgres', qq[
+    SELECT count(*) > 0
+	FROM pg_replication_slots
+	WHERE slot_name ~ '^repack_[0-9]+'
+	  AND slot_type = 'logical'
+	  AND temporary
+	  AND restart_lsn IS NOT NULL
+	  AND restart_lsn <> '$restart_lsn_before'::pg_lsn]),
+	'REPACK slot restart_lsn advances while command is still running');
+
+$node->safe_psql('postgres', 'CHECKPOINT');
+
+# Test that the old WAL segment was recycled
+ok(!-f $segment_before_path,
+	'old WAL segment was recycled while REPACK CONCURRENTLY was running');
+
+$node->safe_psql('postgres',
+	"SELECT injection_points_wakeup('repack-concurrently-before-lock')");
+
+$repack->quit;
+
 done_testing();
-- 
2.43.0

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: RE: Adding REPACK [concurrently]
  In-Reply-To: <TY4PR01MB177181DF3B3DA853AA2298D8D94092@TY4PR01MB17718.jpnprd01.prod.outlook.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox