public inbox for [email protected]  
help / color / mirror / Atom feed
From: Bharath Rupireddy <[email protected]>
To: Masahiko Sawada <[email protected]>
Cc: Srinath Reddy Sadipiralla <[email protected]>
Cc: SATYANARAYANA NARLAPURAM <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: John H <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Introduce XID age based replication slot invalidation
Date: Tue, 31 Mar 2026 10:20:56 -0700
Message-ID: <CALj2ACVGpVHuqchPPFWdiLDN-PDPCEe=sU43YB7nqafE+VMXaQ@mail.gmail.com> (raw)
In-Reply-To: <CAD21AoB4MsTpG5JEkifght_tQ91VHJO_8kpsDCrG-z9qkkmN5g@mail.gmail.com>
References: <CA+-JvFsMHckBMzsu5Ov9HCG3AFbMh056hHy1FiXazBRtZ9pFBg@mail.gmail.com>
	<OSCPR01MB149667506BCFD684FEFDCB919F511A@OSCPR01MB14966.jpnprd01.prod.outlook.com>
	<CALj2ACUmPbkcj4y4oeXvzUkBejG68QDtrFF7QHDC_qz2vQcTCg@mail.gmail.com>
	<CAHg+QDfnK7tQxsEZox=kOkYfqANmL70mwn0N=eRrJxE1Z+1ygg@mail.gmail.com>
	<CALj2ACX_o+dKeAaK76mpAtG646UnDHpGUWziUkCvicVz8mz6=A@mail.gmail.com>
	<CAD21AoATM=Un8ejnbcDQ7q+CaCoxpkA7Ln9bacvQEoymqvjPug@mail.gmail.com>
	<CALj2ACUmUb=CLEyfsQrW0WAkF6Y9fiBfG6pnPjepfOM7A1XReA@mail.gmail.com>
	<CAD21AoAg6x57a8LoP2s+0vgizp9QGHcLGJL9bwh7kzEJq3arBg@mail.gmail.com>
	<CALj2ACWcaSkfMAQu3s6ZkTZuoFvVRD=DkxXbBwC33PL9+kzsqw@mail.gmail.com>
	<CAD21AoBEBqQONiZxvnUYOu814yB2tRPrmX=7KqvL+f3ae7250w@mail.gmail.com>
	<CALj2ACUenekLgjMr8x4DyuU=zKZ4eNqW9XF-1PovSctkY2VA0Q@mail.gmail.com>
	<CAFC+b6pO44=zGqwijzrcyGGTYCM51Y7zS5uQX0_nWjsxW9i3QQ@mail.gmail.com>
	<CALj2ACU=A31kqHELyaF-=2vnyed=cO2JNQt+vY92KtHLF-m8sg@mail.gmail.com>
	<CAD21AoB4MsTpG5JEkifght_tQ91VHJO_8kpsDCrG-z9qkkmN5g@mail.gmail.com>

Hi,

On Mon, Mar 30, 2026 at 5:13 PM Masahiko Sawada <[email protected]> wrote:
>
> I've reviewed the v6 patch. Here are some comments.

Thank you for reviewing the patch.

>  bool
>  vacuum_get_cutoffs(Relation rel, const VacuumParams params,
> -                  struct VacuumCutoffs *cutoffs)
> +                  struct VacuumCutoffs *cutoffs,
> +                  TransactionId *slot_xmin,
> +                  TransactionId *slot_catalog_xmin)
>
> How about storing both slot_xmin and catalog_xmin into VacuumCutoffs?

Done.

> ---
> -   if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED |
> RS_INVAL_IDLE_TIMEOUT,
> +   possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT |
> +       RS_INVAL_XID_AGE;
> +
> +   if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
>                                            _logSegNo, InvalidOid,
> +                                          InvalidTransactionId,
> +                                          max_slot_xid_age > 0 ?
> +                                          ReadNextTransactionId() :
>                                            InvalidTransactionId))
>
> It's odd to me that we specify RS_INVAL_XID_AGE while passing
> InvalidTransactionId. I think we can specify RS_INVAL_XID_AGE along
> with a valid recentXId only when we'd like to check the slots based on
> their XIDs.

Done.

> ---
> +   /* Check if the slot needs to be invalidated due to max_slot_xid_age GUC */
> +   if ((possible_causes & RS_INVAL_XID_AGE) && CanInvalidateXidAgedSlot(s))
> +   {
> +       TransactionId xidLimit;
> +
> +       Assert(TransactionIdIsValid(recentXid));
> +
> +       xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
> +
>
> I think we can avoid calculating xidLimit for every slot by
> calculating it in InvalidatePossiblyObsoleteSlot() and passing it to
> DetermineSlotInvalidationCause().

Done.

> ---
>   */
>  TransactionId
>  GetOldestNonRemovableTransactionId(Relation rel)
> +{
> +   return GetOldestNonRemovableTransactionIdExt(rel, NULL, NULL);
> +}
> +
> +/*
> + * Same as GetOldestNonRemovableTransactionId(), but also returns the
> + * replication slot xmin and catalog_xmin from the same ComputeXidHorizons()
> + * call.  This avoids a separate ProcArrayLock acquisition when the caller
> + * needs both values.
> + */
> +TransactionId
> +GetOldestNonRemovableTransactionIdExt(Relation rel,
> +                                     TransactionId *slot_xmin,
> +                                     TransactionId *slot_catalog_xmin)
>  {
>
> I understand that the primary reason why the patch introduces another
> variant of GetOldestNonRemovableTransactionId() is to avoid extra
> ProcArrayLock acquision to get replication slot xmin and catalog_xmin.
> While it's not very elegant, I find that it would not be bad because
> otherwise autovacuum takes extra ProcArrayLock (in shared mode) for
> every table to vacuum. The ProcArrayLock is already known
> high-contented lock it would be better to avoid taking it once more.
> If others think differently, we can just call
> ProcArrayGetReplicationSlotXmin() separately and compare them to the
> limit of XID-age based slot invalidation.

I understand the concerns around the ProcArrayLock and I think a new
function to return the computed slot's xmin and catalog_xmin is good.

> Having said that, I personally don't want to add new instructions to
> the existing GetOldestNonRemovableTransactionId(). I guess we might
> want to make both the existing function and new function call a common
> (inline) function that takes ComputeXidHorizonsResult and returns
> appropriate transaction id based on the given relation .

Done.

> ---
> +   # Do some work to advance xids
> +   $node->safe_psql(
> +       'postgres', qq[
> +       do \$\$
> +       begin
> +       for i in 1..$nxids loop
> +           -- use an exception block so that each iteration eats an XID
> +           begin
> +           insert into $table_name values (i);
> +           exception
> +           when division_by_zero then null;
> +           end;
> +       end loop;
> +       end\$\$;
> +   ]);
>
> I think it's fater to use pg_current_xact_id() instead.

Done. I pulled this from an existing test case in 001_stream_rep.pl.
Used the pg_current_xact_id approach. Testing times stay the same i.e.
9 wallclock secs.

> ---
> +   else
> +   {
> +       $node->safe_psql('postgres', "VACUUM");
> +   }
>
> We don't need to vacuum all tables here.

Fixed.

> ---
> +# Configure primary with XID age settings. Set autovacuum_naptime high so
> +# that the checkpointer (not vacuum) triggers the invalidation.
> +my $max_slot_xid_age = 500;
> +$primary5->append_conf(
> +   'postgresql.conf', qq{
> +max_slot_xid_age = $max_slot_xid_age
> +autovacuum_naptime = '1h'
> +});
>
> I think that it's better to disable autovacuum than setting a large number.

Done.

> ---
> +# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_age
> +# GUC (via checkpoint).
>
> I think that we can say "physical slot" instead of standby's slot to
> avoid confusion as I thought standby's slot is a slot created on the
> standby at the first glance.

Fixed.

> ---
> Do we have tests for invalidating slots on the standbys?

Added a test case for this.

Please find the attached v7 patches for further review. Thank you!

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v7-0001-Add-XID-age-based-replication-slot-invalidation.patch (31.7K, 2-v7-0001-Add-XID-age-based-replication-slot-invalidation.patch)
  download | inline diff:
From af66f78a12d15ed26356eaead75dfac3ccd56819 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 31 Mar 2026 16:09:15 +0000
Subject: [PATCH v7 1/2] Add XID age based replication slot invalidation

Introduce max_slot_xid_age, a GUC that invalidates replication
slots whose xmin or catalog_xmin exceeds the specified age.
Disabled by default.

Idle or forgotten replication slots can hold back vacuum, leading
to bloat and eventually XID wraparound. In the worst case this
requires dropping the slot and single-user mode vacuuming. This
setting avoids that by proactively invalidating slots that have
fallen too far behind.

Invalidation checks are performed once per relation during vacuum
(both vacuum command and autovacuum), and also by the checkpointer
during checkpoints and restartpoints.
---
 doc/src/sgml/config.sgml                      |  40 ++++
 doc/src/sgml/system-views.sgml                |   8 +
 src/backend/access/heap/vacuumlazy.c          |  15 ++
 src/backend/access/transam/xlog.c             |  34 +++-
 src/backend/commands/vacuum.c                 |   4 +-
 src/backend/replication/slot.c                | 130 ++++++++++++-
 src/backend/storage/ipc/procarray.c           |  60 ++++--
 src/backend/storage/ipc/standby.c             |   3 +-
 src/backend/utils/misc/guc_parameters.dat     |   8 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/commands/vacuum.h                 |   9 +
 src/include/replication/slot.h                |  10 +-
 src/include/storage/procarray.h               |   3 +
 src/test/recovery/t/019_replslot_limit.pl     | 178 ++++++++++++++++++
 14 files changed, 478 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 229f41353eb..46aac59cb20 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4764,6 +4764,46 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-slot-xid-age" xreflabel="max_slot_xid_age">
+      <term><varname>max_slot_xid_age</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_slot_xid_age</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Invalidate replication slots whose <literal>xmin</literal> (the oldest
+        transaction that this slot needs the database to retain) or
+        <literal>catalog_xmin</literal> (the oldest transaction affecting the
+        system catalogs that this slot needs the database to retain) has reached
+        the age specified by this setting. This invalidation check happens
+        during vacuum (both <command>VACUUM</command> command and autovacuum)
+        and during checkpoints.
+        A value of zero (which is default) disables this feature. Users can set
+        this value anywhere from zero to two billion. This parameter can only be
+        set in the <filename>postgresql.conf</filename> file or on the server
+        command line.
+       </para>
+
+       <para>
+        Idle or forgotten replication slots can hold back vacuum, leading to
+        bloat and eventually transaction ID wraparound. This setting avoids
+        that by invalidating slots that have fallen too far behind.
+        See <xref linkend="routine-vacuuming"/> for more details.
+       </para>
+
+       <para>
+        Note that this invalidation mechanism is not applicable for slots
+        on the standby server that are being synced from the primary server
+        (i.e., standby slots having
+        <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>
+        value <literal>true</literal>). Synced slots are always considered to
+        be inactive because they don't perform logical decoding to produce
+        changes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-wal-sender-timeout" xreflabel="wal_sender_timeout">
       <term><varname>wal_sender_timeout</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 9ee1a2bfc6a..1a507b430f9 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3102,6 +3102,14 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
           <xref linkend="guc-idle-replication-slot-timeout"/> duration.
          </para>
         </listitem>
+        <listitem>
+         <para>
+          <literal>xid_aged</literal> means that the slot's
+          <literal>xmin</literal> or <literal>catalog_xmin</literal>
+          has reached the age specified by
+          <xref linkend="guc-max-slot-xid-age"/> parameter.
+         </para>
+        </listitem>
        </itemizedlist>
       </para></entry>
      </row>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 24001b27387..98822823d2e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -147,6 +147,7 @@
 #include "pgstat.h"
 #include "portability/instr_time.h"
 #include "postmaster/autovacuum.h"
+#include "replication/slot.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "storage/latch.h"
@@ -799,6 +800,20 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
 	 * to increase the number of dead tuples it can prune away.)
 	 */
 	vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
+
+	/*
+	 * Try to invalidate XID-aged replication slots. Use the slot xmin values
+	 * obtained from the same horizons computation that produced OldestXmin,
+	 * avoiding an extra ProcArrayLock acquisition.
+	 */
+	if (MaybeInvalidateXIDAgedSlots(vacrel->cutoffs.slot_xmin,
+									vacrel->cutoffs.slot_catalog_xmin))
+	{
+		/* Recompute cutoffs after slot invalidation */
+		vacrel->aggressive = vacuum_get_cutoffs(rel, params,
+												&vacrel->cutoffs);
+	}
+
 	vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
 	vacrel->vistest = GlobalVisTestFor(rel);
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2c1c6f88b74..70c1d5c5559 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7019,6 +7019,8 @@ CreateCheckPoint(int flags)
 	VirtualTransactionId *vxids;
 	int			nvxids;
 	int			oldXLogAllowed = 0;
+	uint32		possibleInvalidationCauses;
+	TransactionId recentXid;
 
 	/*
 	 * An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7447,9 +7449,20 @@ CreateCheckPoint(int flags)
 	 */
 	XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
 	KeepLogSeg(recptr, &_logSegNo);
-	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
+
+	possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
+	recentXid = InvalidTransactionId;
+
+	if (max_slot_xid_age > 0)
+	{
+		possibleInvalidationCauses |= RS_INVAL_XID_AGE;
+		recentXid = ReadNextTransactionId();
+	}
+
+	if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
 										   _logSegNo, InvalidOid,
-										   InvalidTransactionId))
+										   InvalidTransactionId,
+										   recentXid))
 	{
 		/*
 		 * Some slots have been invalidated; recalculate the old-segment
@@ -7730,6 +7743,8 @@ CreateRestartPoint(int flags)
 	XLogRecPtr	endptr;
 	XLogSegNo	_logSegNo;
 	TimestampTz xtime;
+	uint32		possibleInvalidationCauses;
+	TransactionId recentXid;
 
 	/* Concurrent checkpoint/restartpoint cannot happen */
 	Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7904,9 +7919,19 @@ CreateRestartPoint(int flags)
 
 	INJECTION_POINT("restartpoint-before-slot-invalidation", NULL);
 
-	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
+	possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
+	recentXid = InvalidTransactionId;
+
+	if (max_slot_xid_age > 0)
+	{
+		possibleInvalidationCauses |= RS_INVAL_XID_AGE;
+		recentXid = ReadNextTransactionId();
+	}
+
+	if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
 										   _logSegNo, InvalidOid,
-										   InvalidTransactionId))
+										   InvalidTransactionId,
+										   recentXid))
 	{
 		/*
 		 * Some slots have been invalidated; recalculate the old-segment
@@ -8770,6 +8795,7 @@ xlog_redo(XLogReaderState *record)
 				 */
 				InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
 												   0, InvalidOid,
+												   InvalidTransactionId,
 												   InvalidTransactionId);
 			}
 			else if (sync_replication_slots)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 766a518c7a1..d78a05a4978 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,7 +1133,9 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams params,
 	 * that only one vacuum process can be working on a particular table at
 	 * any time, and that each vacuum is always an independent transaction.
 	 */
-	cutoffs->OldestXmin = GetOldestNonRemovableTransactionId(rel);
+	cutoffs->OldestXmin = GetOldestNonRemovableTransactionIdExt(rel,
+																&cutoffs->slot_xmin,
+																&cutoffs->slot_catalog_xmin);
 
 	Assert(TransactionIdIsNormal(cutoffs->OldestXmin));
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a9092fc2382..20729d2fb42 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -117,6 +117,7 @@ static const SlotInvalidationCauseMap SlotInvalidationCauses[] = {
 	{RS_INVAL_HORIZON, "rows_removed"},
 	{RS_INVAL_WAL_LEVEL, "wal_level_insufficient"},
 	{RS_INVAL_IDLE_TIMEOUT, "idle_timeout"},
+	{RS_INVAL_XID_AGE, "xid_aged"},
 };
 
 /*
@@ -158,6 +159,12 @@ int			max_replication_slots = 10; /* the maximum number of replication
  */
 int			idle_replication_slot_timeout_secs = 0;
 
+/*
+ * Invalidate replication slots that have xmin or catalog_xmin older
+ * than the specified age; '0' disables it.
+ */
+int			max_slot_xid_age = 0;
+
 /*
  * This GUC lists streaming replication standby server slot names that
  * logical WAL sender processes will wait for.
@@ -1780,7 +1787,10 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 					   XLogRecPtr restart_lsn,
 					   XLogRecPtr oldestLSN,
 					   TransactionId snapshotConflictHorizon,
-					   long slot_idle_seconds)
+					   long slot_idle_seconds,
+					   TransactionId xmin,
+					   TransactionId catalog_xmin,
+					   TransactionId recentXid)
 {
 	StringInfoData err_detail;
 	StringInfoData err_hint;
@@ -1825,6 +1835,30 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 								 "idle_replication_slot_timeout");
 				break;
 			}
+
+		case RS_INVAL_XID_AGE:
+			{
+				Assert(TransactionIdIsValid(xmin) || TransactionIdIsValid(catalog_xmin));
+
+				if (TransactionIdIsValid(xmin))
+				{
+					/* translator: %s is a GUC variable name */
+					appendStringInfo(&err_detail, _("The slot's xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
+									 xmin, (int32) (recentXid - xmin), "max_slot_xid_age", max_slot_xid_age);
+				}
+				else
+				{
+					/* translator: %s is a GUC variable name */
+					appendStringInfo(&err_detail, _("The slot's catalog_xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
+									 catalog_xmin, (int32) (recentXid - catalog_xmin), "max_slot_xid_age", max_slot_xid_age);
+				}
+
+				/* translator: %s is a GUC variable name */
+				appendStringInfo(&err_hint, _("You might need to increase \"%s\"."),
+								 "max_slot_xid_age");
+				break;
+			}
+
 		case RS_INVAL_NONE:
 			pg_unreachable();
 	}
@@ -1863,6 +1897,25 @@ CanInvalidateIdleSlot(ReplicationSlot *s)
 			!(RecoveryInProgress() && s->data.synced));
 }
 
+/*
+ * Can we invalidate an XID-aged replication slot?
+ *
+ * XID-aged based invalidation is allowed to the given slot when:
+ *
+ * 1. Max XID-age is set
+ * 2. Slot has valid xmin or catalog_xmin
+ * 3. The slot is not being synced from the primary while the server is in
+ *	  recovery.
+ */
+static inline bool
+CanInvalidateXidAgedSlot(ReplicationSlot *s)
+{
+	return (max_slot_xid_age != 0 &&
+			(TransactionIdIsValid(s->data.xmin) ||
+			 TransactionIdIsValid(s->data.catalog_xmin)) &&
+			!(RecoveryInProgress() && s->data.synced));
+}
+
 /*
  * DetermineSlotInvalidationCause - Determine the cause for which a slot
  * becomes invalid among the given possible causes.
@@ -1874,6 +1927,7 @@ static ReplicationSlotInvalidationCause
 DetermineSlotInvalidationCause(uint32 possible_causes, ReplicationSlot *s,
 							   XLogRecPtr oldestLSN, Oid dboid,
 							   TransactionId snapshotConflictHorizon,
+							   TransactionId xidLimit,
 							   TimestampTz *inactive_since, TimestampTz now)
 {
 	Assert(possible_causes != RS_INVAL_NONE);
@@ -1945,6 +1999,18 @@ DetermineSlotInvalidationCause(uint32 possible_causes, ReplicationSlot *s,
 		}
 	}
 
+	/* Check if the slot needs to be invalidated due to max_slot_xid_age GUC */
+	if ((possible_causes & RS_INVAL_XID_AGE) && CanInvalidateXidAgedSlot(s))
+	{
+		Assert(TransactionIdIsValid(xidLimit));
+
+		if ((TransactionIdIsValid(s->data.xmin) &&
+			 TransactionIdPrecedes(s->data.xmin, xidLimit)) ||
+			(TransactionIdIsValid(s->data.catalog_xmin) &&
+			 TransactionIdPrecedes(s->data.catalog_xmin, xidLimit)))
+			return RS_INVAL_XID_AGE;
+	}
+
 	return RS_INVAL_NONE;
 }
 
@@ -1967,12 +2033,19 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 							   ReplicationSlot *s,
 							   XLogRecPtr oldestLSN,
 							   Oid dboid, TransactionId snapshotConflictHorizon,
+							   TransactionId recentXid,
 							   bool *released_lock_out)
 {
 	int			last_signaled_pid = 0;
 	bool		released_lock = false;
 	bool		invalidated = false;
 	TimestampTz inactive_since = 0;
+	TransactionId xidLimit = InvalidTransactionId;
+
+	/* Compute the XID limit once, to avoid redundant work per slot */
+	if ((possible_causes & RS_INVAL_XID_AGE) &&
+		TransactionIdIsValid(recentXid))
+		xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
 
 	for (;;)
 	{
@@ -2019,6 +2092,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 																s, oldestLSN,
 																dboid,
 																snapshotConflictHorizon,
+																xidLimit,
 																&inactive_since,
 																now);
 
@@ -2112,7 +2186,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 				ReportSlotInvalidation(invalidation_cause, true, active_pid,
 									   slotname, restart_lsn,
 									   oldestLSN, snapshotConflictHorizon,
-									   slot_idle_secs);
+									   slot_idle_secs, s->data.xmin,
+									   s->data.catalog_xmin, recentXid);
 
 				if (MyBackendType == B_STARTUP)
 					(void) SignalRecoveryConflict(GetPGProcByNumber(active_proc),
@@ -2165,7 +2240,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 			ReportSlotInvalidation(invalidation_cause, false, active_pid,
 								   slotname, restart_lsn,
 								   oldestLSN, snapshotConflictHorizon,
-								   slot_idle_secs);
+								   slot_idle_secs, s->data.xmin,
+								   s->data.catalog_xmin, recentXid);
 
 			/* done with this slot for now */
 			break;
@@ -2192,6 +2268,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
  *   logical.
  * - RS_INVAL_IDLE_TIMEOUT: has been idle longer than the configured
  *   "idle_replication_slot_timeout" duration.
+ * - RS_INVAL_XID_AGE: slot xid age is older than the configured
+ *   "max_slot_xid_age" age.
  *
  * Note: This function attempts to invalidate the slot for multiple possible
  * causes in a single pass, minimizing redundant iterations. The "cause"
@@ -2205,7 +2283,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 bool
 InvalidateObsoleteReplicationSlots(uint32 possible_causes,
 								   XLogSegNo oldestSegno, Oid dboid,
-								   TransactionId snapshotConflictHorizon)
+								   TransactionId snapshotConflictHorizon,
+								   TransactionId recentXid)
 {
 	XLogRecPtr	oldestLSN;
 	bool		invalidated = false;
@@ -2244,7 +2323,7 @@ restart:
 
 		if (InvalidatePossiblyObsoleteSlot(possible_causes, s, oldestLSN,
 										   dboid, snapshotConflictHorizon,
-										   &released_lock))
+										   recentXid, &released_lock))
 		{
 			Assert(released_lock);
 
@@ -3275,3 +3354,44 @@ WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
 
 	ConditionVariableCancelSleep();
 }
+
+/*
+ * Invalidate replication slots whose XID age exceeds the max_slot_xid_age
+ * GUC.
+ *
+ * The slot_xmin and slot_catalog_xmin are the replication slot xmin values
+ * obtained from the same ComputeXidHorizons() call that computed OldestXmin
+ * during vacuum. Using these avoids a separate ProcArrayLock acquisition.
+ *
+ * Returns true if at least one slot was invalidated.
+ */
+bool
+MaybeInvalidateXIDAgedSlots(TransactionId slot_xmin,
+							TransactionId slot_catalog_xmin)
+{
+	TransactionId recentXid;
+	TransactionId xidLimit;
+	bool		invalidated = false;
+
+	if (max_slot_xid_age == 0)
+		return false;
+
+	recentXid = ReadNextTransactionId();
+	xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
+
+	/*
+	 * Invalidate possibly obsolete slots based on XID-age, if either slot's
+	 * xmin or catalog_xmin is older than the cutoff.
+	 */
+	if ((TransactionIdIsValid(slot_xmin) &&
+		 TransactionIdPrecedes(slot_xmin, xidLimit)) ||
+		(TransactionIdIsValid(slot_catalog_xmin) &&
+		 TransactionIdPrecedes(slot_catalog_xmin, xidLimit)))
+		invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
+														 0,
+														 InvalidOid,
+														 InvalidTransactionId,
+														 recentXid);
+
+	return invalidated;
+}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index cc207cb56e3..9e0acf7309d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1937,6 +1937,30 @@ GlobalVisHorizonKindForRel(Relation rel)
 		return VISHORIZON_TEMP;
 }
 
+/*
+ * Helper to return the appropriate oldest non-removable TransactionId from
+ * pre-computed horizons, based on the relation type.
+ */
+static inline TransactionId
+GetOldestNonRemovableTransactionIdFromHorizons(ComputeXidHorizonsResult *horizons,
+											   Relation rel)
+{
+	switch (GlobalVisHorizonKindForRel(rel))
+	{
+		case VISHORIZON_SHARED:
+			return horizons->shared_oldest_nonremovable;
+		case VISHORIZON_CATALOG:
+			return horizons->catalog_oldest_nonremovable;
+		case VISHORIZON_DATA:
+			return horizons->data_oldest_nonremovable;
+		case VISHORIZON_TEMP:
+			return horizons->temp_oldest_nonremovable;
+	}
+
+	/* just to prevent compiler warnings */
+	return InvalidTransactionId;
+}
+
 /*
  * Return the oldest XID for which deleted tuples must be preserved in the
  * passed table.
@@ -1955,20 +1979,30 @@ GetOldestNonRemovableTransactionId(Relation rel)
 
 	ComputeXidHorizons(&horizons);
 
-	switch (GlobalVisHorizonKindForRel(rel))
-	{
-		case VISHORIZON_SHARED:
-			return horizons.shared_oldest_nonremovable;
-		case VISHORIZON_CATALOG:
-			return horizons.catalog_oldest_nonremovable;
-		case VISHORIZON_DATA:
-			return horizons.data_oldest_nonremovable;
-		case VISHORIZON_TEMP:
-			return horizons.temp_oldest_nonremovable;
-	}
+	return GetOldestNonRemovableTransactionIdFromHorizons(&horizons, rel);
+}
 
-	/* just to prevent compiler warnings */
-	return InvalidTransactionId;
+/*
+ * Same as GetOldestNonRemovableTransactionId(), but also returns the
+ * replication slot xmin and catalog_xmin from the same ComputeXidHorizons()
+ * call.  This avoids a separate ProcArrayLock acquisition when the caller
+ * needs both values.
+ */
+TransactionId
+GetOldestNonRemovableTransactionIdExt(Relation rel,
+									  TransactionId *slot_xmin,
+									  TransactionId *slot_catalog_xmin)
+{
+	ComputeXidHorizonsResult horizons;
+
+	ComputeXidHorizons(&horizons);
+
+	if (slot_xmin)
+		*slot_xmin = horizons.slot_xmin;
+	if (slot_catalog_xmin)
+		*slot_catalog_xmin = horizons.slot_catalog_xmin;
+
+	return GetOldestNonRemovableTransactionIdFromHorizons(&horizons, rel);
 }
 
 /*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index de9092fdf5b..d60f39ec08e 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -504,7 +504,8 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 */
 	if (IsLogicalDecodingEnabled() && isCatalogRel)
 		InvalidateObsoleteReplicationSlots(RS_INVAL_HORIZON, 0, locator.dbOid,
-										   snapshotConflictHorizon);
+										   snapshotConflictHorizon,
+										   InvalidTransactionId);
 }
 
 /*
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0a862693fcd..ca3cc8417da 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -2089,6 +2089,14 @@
   max => 'MAX_KILOBYTES',
 },
 
+{ name => 'max_slot_xid_age', type => 'int', context => 'PGC_SIGHUP', group => 'REPLICATION_SENDING',
+  short_desc => 'Age of the transaction ID at which a replication slot gets invalidated.',
+  variable => 'max_slot_xid_age',
+  boot_val => '0',
+  min => '0',
+  max => '2000000000',
+},
+
 # We use the hopefully-safely-small value of 100kB as the compiled-in
 # default for max_stack_depth.  InitializeGUCOptions will increase it
 # if possible, depending on the actual platform-specific stack limit.
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf15597385b..055eba56bdf 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -351,6 +351,8 @@
 #wal_keep_size = 0              # in megabytes; 0 disables
 #max_slot_wal_keep_size = -1    # in megabytes; -1 disables
 #idle_replication_slot_timeout = 0      # in seconds; 0 disables
+#max_slot_xid_age = 0           # maximum XID age before a replication slot
+                                # gets invalidated; 0 disables
 #wal_sender_timeout = 60s       # in milliseconds; 0 disables
 #track_commit_timestamp = off   # collect timestamp of transaction commit
                                 # (change requires restart)
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 5d351f0df33..38e8e05567f 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -287,6 +287,15 @@ struct VacuumCutoffs
 	 */
 	TransactionId FreezeLimit;
 	MultiXactId MultiXactCutoff;
+
+	/*
+	 * Replication slot xmin and catalog_xmin values obtained from the same
+	 * ComputeXidHorizons() call that computed OldestXmin. These are used for
+	 * XID-age-based replication slot invalidation without requiring an extra
+	 * ProcArrayLock acquisition.
+	 */
+	TransactionId slot_xmin;
+	TransactionId slot_catalog_xmin;
 };
 
 /*
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 4b4709f6e2c..0baa7112559 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -66,10 +66,12 @@ typedef enum ReplicationSlotInvalidationCause
 	RS_INVAL_WAL_LEVEL = (1 << 2),
 	/* idle slot timeout has occurred */
 	RS_INVAL_IDLE_TIMEOUT = (1 << 3),
+	/* slot's xmin or catalog_xmin has reached max xid age */
+	RS_INVAL_XID_AGE = (1 << 4),
 } ReplicationSlotInvalidationCause;
 
 /* Maximum number of invalidation causes */
-#define	RS_INVAL_MAX_CAUSES 4
+#define	RS_INVAL_MAX_CAUSES 5
 
 /*
  * When the slot synchronization worker is running, or when
@@ -326,6 +328,7 @@ extern PGDLLIMPORT ReplicationSlot *MyReplicationSlot;
 extern PGDLLIMPORT int max_replication_slots;
 extern PGDLLIMPORT char *synchronized_standby_slots;
 extern PGDLLIMPORT int idle_replication_slot_timeout_secs;
+extern PGDLLIMPORT int max_slot_xid_age;
 
 /* shmem initialization functions */
 extern Size ReplicationSlotsShmemSize(void);
@@ -367,7 +370,10 @@ extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern bool InvalidateObsoleteReplicationSlots(uint32 possible_causes,
 											   XLogSegNo oldestSegno,
 											   Oid dboid,
-											   TransactionId snapshotConflictHorizon);
+											   TransactionId snapshotConflictHorizon,
+											   TransactionId recentXid);
+extern bool MaybeInvalidateXIDAgedSlots(TransactionId slot_xmin,
+										TransactionId slot_catalog_xmin);
 extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern int	ReplicationSlotIndex(ReplicationSlot *slot);
 extern bool ReplicationSlotName(int index, Name name);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index abdf021e66e..c198fd22515 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -53,6 +53,9 @@ extern RunningTransactions GetRunningTransactionData(void);
 
 extern bool TransactionIdIsInProgress(TransactionId xid);
 extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
+extern TransactionId GetOldestNonRemovableTransactionIdExt(Relation rel,
+														   TransactionId *slot_xmin,
+														   TransactionId *slot_catalog_xmin);
 extern TransactionId GetOldestTransactionIdConsideredRunning(void);
 extern TransactionId GetOldestActiveTransactionId(bool inCommitOnly,
 												  bool allDbs);
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7b253e64d9c..e82bef93879 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -540,4 +540,182 @@ is( $publisher4->safe_psql(
 $publisher4->stop;
 $subscriber4->stop;
 
+# Advance the given number of XIDs
+sub advance_xids
+{
+	my ($node, $nxids) = @_;
+	my $sql = join(";\n", ("SELECT pg_current_xact_id()") x $nxids);
+	$node->safe_psql('postgres', $sql);
+}
+
+# Wait for the given slot to be invalidated with reason 'xid_aged'
+sub wait_for_xid_aged_invalidation
+{
+	my ($node, $slot_name) = @_;
+	$node->poll_query_until(
+		'postgres', qq[
+		SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+			WHERE slot_name = '$slot_name' AND
+			active = false AND
+			invalidation_reason = 'xid_aged';
+	]) or die "Timed out waiting for slot $slot_name to be invalidated";
+}
+
+# =====================================================================
+# Testcase start: Invalidate physical slot due to max_slot_xid_age GUC
+
+# Initialize primary node for XID age tests
+my $primary5 = PostgreSQL::Test::Cluster->new('primary5');
+$primary5->init(allows_streaming => 'logical');
+
+# Disable autovacuum so checkpointer triggers the invalidation
+my $max_slot_xid_age = 100;
+$primary5->append_conf(
+	'postgresql.conf', qq{
+max_slot_xid_age = $max_slot_xid_age
+autovacuum = off
+});
+
+$primary5->start;
+
+# Take a backup for creating standby
+$backup_name = 'backup5';
+$primary5->backup($backup_name);
+
+# Create standby with HS feedback so the slot gains an xmin
+my $standby5 = PostgreSQL::Test::Cluster->new('standby5');
+$standby5->init_from_backup($primary5, $backup_name, has_streaming => 1);
+$standby5->append_conf(
+	'postgresql.conf', q{
+primary_slot_name = 'sb5_slot'
+hot_standby_feedback = on
+wal_receiver_status_interval = 1
+});
+$primary5->safe_psql(
+	'postgres', qq[
+    SELECT pg_create_physical_replication_slot(slot_name := 'sb5_slot', immediately_reserve := true);
+]);
+$standby5->start;
+
+# Create some content on primary to move xmin
+$primary5->safe_psql('postgres',
+	"CREATE TABLE tab_int5 AS SELECT generate_series(1,10) AS a");
+$primary5->wait_for_catchup($standby5);
+
+# Wait for the physical slot to get xmin via hot_standby_feedback
+$primary5->poll_query_until(
+	'postgres', qq[
+	SELECT xmin IS NOT NULL
+		FROM pg_catalog.pg_replication_slots
+		WHERE slot_name = 'sb5_slot';
+]) or die "Timed out waiting for slot sb5_slot xmin from HS feedback";
+
+# Stop standby so the slot becomes inactive with its xmin frozen
+$standby5->stop;
+
+# Advance XIDs past 2x max_slot_xid_age so the slot's xmin is stale enough
+advance_xids($primary5, 2 * $max_slot_xid_age);
+$primary5->safe_psql('postgres', "CHECKPOINT");
+wait_for_xid_aged_invalidation($primary5, 'sb5_slot');
+ok(1, "physical slot invalidated due to XID age (via checkpoint)");
+
+# Testcase end: Invalidate physical slot due to max_slot_xid_age GUC
+# ===================================================================
+
+# ====================================================================
+# Testcase start: Invalidate logical slot due to max_slot_xid_age GUC
+
+# Re-enable autovacuum so that VACUUM-triggered invalidation works normally
+$primary5->safe_psql('postgres',
+	"ALTER SYSTEM SET autovacuum = on; SELECT pg_reload_conf();");
+
+# Create a subscriber node
+my $subscriber5 = PostgreSQL::Test::Cluster->new('subscriber5');
+$subscriber5->init(allows_streaming => 'logical');
+$subscriber5->start;
+
+# Create tables on both primary and subscriber
+$primary5->safe_psql('postgres', "CREATE TABLE test_tbl5 (id int)");
+$subscriber5->safe_psql('postgres', "CREATE TABLE test_tbl5 (id int)");
+$primary5->safe_psql('postgres',
+	"INSERT INTO test_tbl5 VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $primary5_connstr = $primary5->connstr . ' dbname=postgres';
+$primary5->safe_psql('postgres',
+	"CREATE PUBLICATION pub5 FOR TABLE test_tbl5");
+$subscriber5->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub5 CONNECTION '$primary5_connstr' PUBLICATION pub5 WITH (slot_name = 'lsub5_slot')"
+);
+
+# Wait for initial sync
+$subscriber5->wait_for_subscription_sync($primary5, 'sub5');
+
+$result = $subscriber5->safe_psql('postgres', "SELECT count(*) FROM test_tbl5");
+is($result, qq(5), "check initial copy was done for logical replication (XID age test)");
+
+# Wait for the logical slot to get catalog_xmin
+$primary5->poll_query_until(
+	'postgres', qq[
+	SELECT xmin IS NULL AND catalog_xmin IS NOT NULL
+	FROM pg_catalog.pg_replication_slots
+	WHERE slot_name = 'lsub5_slot';
+]) or die "Timed out waiting for slot lsub5_slot catalog_xmin to advance";
+
+# Stop subscriber to make the slot inactive
+$subscriber5->stop;
+
+# Advance XIDs past 2x max_slot_xid_age so the slot's catalog_xmin is stale enough
+advance_xids($primary5, 2 * $max_slot_xid_age);
+$primary5->safe_psql('postgres', "VACUUM test_tbl5");
+wait_for_xid_aged_invalidation($primary5, 'lsub5_slot');
+ok(1, "logical slot invalidated due to XID age (via vacuum)");
+
+# Testcase end: Invalidate logical slot due to max_slot_xid_age GUC
+# ==================================================================
+
+# ===============================================================================
+# Testcase start: Invalidate logical slot on standby due to max_slot_xid_age GUC
+
+# Disable max_slot_xid_age on primary and recreate the streaming slot
+$primary5->safe_psql('postgres',
+	"ALTER SYSTEM SET max_slot_xid_age = 0; SELECT pg_reload_conf();");
+$primary5->safe_psql('postgres',
+	"SELECT pg_drop_replication_slot('sb5_slot')");
+$primary5->safe_psql('postgres',
+	"SELECT pg_create_physical_replication_slot('sb5_slot', true)");
+$standby5->append_conf(
+	'postgresql.conf', qq{
+max_slot_xid_age = $max_slot_xid_age
+autovacuum = off
+});
+$standby5->start;
+
+$primary5->wait_for_catchup($standby5);
+
+$standby5->create_logical_slot_on_standby($primary5, 'sb5_logical_slot',
+	'postgres');
+
+$standby5->poll_query_until(
+	'postgres', qq[
+	SELECT catalog_xmin IS NOT NULL
+	FROM pg_catalog.pg_replication_slots
+	WHERE slot_name = 'sb5_logical_slot';
+]) or die "Timed out waiting for sb5_logical_slot catalog_xmin";
+
+# Advance XIDs on primary, replay on standby, then restartpoint to invalidate
+advance_xids($primary5, 2 * $max_slot_xid_age);
+$primary5->safe_psql('postgres', "CHECKPOINT");
+$primary5->wait_for_catchup($standby5);
+$standby5->safe_psql('postgres', "CHECKPOINT");
+
+wait_for_xid_aged_invalidation($standby5, 'sb5_logical_slot');
+ok(1, "logical (standby) slot invalidated due to XID age (via restartpoint)");
+
+$standby5->stop;
+$primary5->stop;
+
+# Testcase end: Invalidate logical slot on standby due to max_slot_xid_age GUC
+# =============================================================================
+
 done_testing();
-- 
2.47.3



  [application/x-patch] v7-0002-Add-more-tests-for-XID-age-slot-invalidation.patch (6.4K, 3-v7-0002-Add-more-tests-for-XID-age-slot-invalidation.patch)
  download | inline diff:
From e7739c4b57c49c9dac14633831b11fa3c4f18181 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 31 Mar 2026 16:09:32 +0000
Subject: [PATCH v7 2/2] Add more tests for XID age slot invalidation

Consume XIDs up to wraparound WARNING limits with
max_slot_xid_age matching vacuum_failsafe_age (1.6B). Verify that
autovacuum invalidates the inactive replication slot
(XID-age-based invalidation), unblocks datfrozenxid advancement,
and prevents wraparound without any intervention.
---
 src/test/recovery/Makefile                |   3 +-
 src/test/recovery/t/019_replslot_limit.pl | 133 ++++++++++++++++++++++
 2 files changed, 135 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index d41aaaf8ae1..5c3d2c89941 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -12,7 +12,8 @@
 EXTRA_INSTALL=contrib/pg_prewarm \
 	contrib/pg_stat_statements \
 	contrib/test_decoding \
-	src/test/modules/injection_points
+	src/test/modules/injection_points \
+	src/test/modules/xid_wraparound
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index e82bef93879..63e87443edd 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -718,4 +718,137 @@ $primary5->stop;
 # Testcase end: Invalidate logical slot on standby due to max_slot_xid_age GUC
 # =============================================================================
 
+# =================================================================================
+# Testcase start: XID-age-based slot invalidation with autovacuum (production-like)
+
+# Standby sets slot xmin via HS feedback, disconnects, XIDs are consumed.
+# max_slot_xid_age is set to vacuum_failsafe_age (1.6B) so autovacuum
+# invalidates the slot before entering failsafe mode, unblocking
+# datfrozenxid advancement and avoiding XID wraparound without manual
+# VACUUM or downtime.
+
+# Verify server log shows slot invalidation by autovacuum worker
+sub verify_slot_xid_aged_invalidation_in_server_log
+{
+	my ($node, $slot_name, $slot_xmin, $max_age, $consumed_xids) = @_;
+
+	my $log = slurp_file($node->logfile);
+
+	# Verify the invalidation was performed by an autovacuum worker
+	like($log,
+		qr/autovacuum worker\[\d+\] LOG:\s+invalidating obsolete replication slot "$slot_name"/,
+		"server log: $slot_name invalidated by autovacuum worker");
+
+	# Verify DETAIL shows the correct xmin and max_slot_xid_age
+	like($log,
+		qr/autovacuum worker\[\d+\] DETAIL:\s+The slot's xmin $slot_xmin is (\d+) transactions old, which exceeds the configured "max_slot_xid_age" value of $max_age\./,
+		"server log: DETAIL shows xmin $slot_xmin and age $max_age");
+
+	# Extract xid age from the log and report for diagnostics
+	$log =~
+	  /The slot's xmin $slot_xmin is (\d+) transactions old/;
+	my $log_xid_age = $1 // 'N/A';
+	diag "xid_age from server log=$log_xid_age, max_slot_xid_age=$max_age, consumed=$consumed_xids XIDs";
+}
+
+# Verify slot invalidation and wait for autovacuum to advance datfrozenxid
+sub verify_invalidation_and_recovery
+{
+	my ($node, $slot_name, $slot_xmin, $max_age, $consumed_xids) = @_;
+
+	return if $max_age == 0;
+
+	wait_for_xid_aged_invalidation($node, $slot_name);
+	ok(1, 'autovacuum invalidated slot due to xid_aged');
+
+	verify_slot_xid_aged_invalidation_in_server_log($node, $slot_name,
+		$slot_xmin, $max_age, $consumed_xids);
+
+	# Wait for autovacuum to advance datfrozenxid in all databases past the
+	# wraparound threshold.
+	$node->poll_query_until(
+		'postgres', qq[
+		SELECT NOT EXISTS (
+			SELECT 1 FROM pg_database
+			WHERE age(datfrozenxid) > 2000000000
+		);
+	]) or die "Timed out waiting for autovacuum to advance datfrozenxid in all databases";
+}
+
+my $primary6 = PostgreSQL::Test::Cluster->new('primary6');
+$primary6->init(allows_streaming => 'logical');
+
+$max_slot_xid_age = 1600000000;    # matches vacuum_failsafe_age default
+$primary6->append_conf(
+	'postgresql.conf', qq{
+max_slot_xid_age = $max_slot_xid_age
+autovacuum_naptime = 1s
+});
+
+$primary6->start;
+$primary6->safe_psql('postgres', "CREATE EXTENSION xid_wraparound");
+
+$backup_name = 'backup6';
+$primary6->backup($backup_name);
+
+my $standby6 = PostgreSQL::Test::Cluster->new('standby6');
+$standby6->init_from_backup($primary6, $backup_name, has_streaming => 1);
+$standby6->append_conf(
+	'postgresql.conf', q{
+primary_slot_name = 'sb6_slot'
+hot_standby_feedback = on
+wal_receiver_status_interval = 1
+});
+
+$primary6->safe_psql('postgres',
+	"SELECT pg_create_physical_replication_slot('sb6_slot', true)");
+
+$standby6->start;
+
+$primary6->safe_psql('postgres',
+	"CREATE TABLE tab_int6 AS SELECT generate_series(1,10) AS a");
+$primary6->wait_for_catchup($standby6);
+
+$primary6->poll_query_until(
+	'postgres', qq[
+	SELECT xmin IS NOT NULL FROM pg_replication_slots
+		WHERE slot_name = 'sb6_slot';
+]) or die "Timed out waiting for sb6_slot xmin from HS feedback";
+
+# Capture the slot's xmin for later log verification
+my $slot_xmin = $primary6->safe_psql('postgres',
+	"SELECT xmin FROM pg_replication_slots WHERE slot_name = 'sb6_slot'");
+
+# Stop standby; slot xmin persists and holds back datfrozenxid
+$standby6->stop;
+
+# Consume XIDs in 50M chunks; autovacuum (naptime=1s) will invalidate the
+# slot once xmin age exceeds max_slot_xid_age.
+my $logstart6 = -s $primary6->logfile;
+my $chunk = 50_000_000;
+my $max_xids = 2_200_000_000;
+my $consumed = 0;
+
+while ($consumed < $max_xids)
+{
+	$primary6->safe_psql('postgres', "SELECT consume_xids($chunk)");
+	$consumed += $chunk;
+	my $remaining = $max_xids - $consumed;
+	diag "consumed $consumed / $max_xids XIDs ($remaining remaining)";
+}
+
+verify_invalidation_and_recovery($primary6, 'sb6_slot',
+	$slot_xmin, $max_slot_xid_age, $consumed);
+
+# Consume 1B more XIDs — combining with the 2.2B consumed above, the total
+# of 3.2B exceeds the 2^31 (~2.1B) usable XID space (xidStopLimit), i.e.
+# more than one full wraparound cycle, proving the system is healthy.
+$primary6->safe_psql('postgres', "SELECT consume_xids(1000000000)");
+ok(1, 'writes succeed after autovacuum invalidated the slot');
+
+$primary6->stop;
+
+# Testcase end: XID-age-based slot invalidation with autovacuum (production-like)
+# ================================================================================
+
 done_testing();
-- 
2.47.3



view thread (31+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Introduce XID age based replication slot invalidation
  In-Reply-To: <CALj2ACVGpVHuqchPPFWdiLDN-PDPCEe=sU43YB7nqafE+VMXaQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox