Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w7Mjv-005E96-00 for pgsql-hackers@arkaria.postgresql.org; Tue, 31 Mar 2026 00:13:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w7Mjs-007IR9-1C for pgsql-hackers@arkaria.postgresql.org; Tue, 31 Mar 2026 00:13:48 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w7Mjs-007IQu-09 for pgsql-hackers@lists.postgresql.org; Tue, 31 Mar 2026 00:13:48 +0000 Received: from mail-lf1-x129.google.com ([2a00:1450:4864:20::129]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w7Mjp-000000025EV-2p6z for pgsql-hackers@postgresql.org; Tue, 31 Mar 2026 00:13:48 +0000 Received: by mail-lf1-x129.google.com with SMTP id 2adb3069b0e04-5a2a5236811so4054924e87.1 for ; Mon, 30 Mar 2026 17:13:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774916025; cv=none; d=google.com; s=arc-20240605; b=K3bbndBMCdylGopuQFduNEEs0Lw/vm/Ur9k6KR7uE6GY6kWAR+ua9pbZj7RlGghR7f 8ra5y0BLGCrDgnv4T5v/SqstAhJO0ooFZu0qOMDCLBQGmf8m8XrSznCPxQRHg1wdjeOp CHVQZQRPh+hmHh2Y7KbygKJgM+ZXiBq4xmr7g74l0k6u/OXUoSlliYwKM8zZULAaFLKI THeUDVOOaHqGpiMq73qsYoJAsAB5rhi+XYZPp+xzBlUCQk0oRs97Y8gZa8pFFdwK7R9J DuylbgBnRjvnvi6JX4YnbXFwJQ7eqEJnubxK3FmVYqc4LbR/FQEWhiURM4t7a/U/hshc T9LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=RpnLXy+hBJm8JdRiolzfKaVO7LqeO5btXayYAxZdp3E=; fh=f3L0sJLLGkp40zdyKWQvEH4EPqx6up9FT1D3rMMi8W4=; b=ZWJK62aaPY+GsbXKZzyk4F33T2na6YYuGqOdhrpRiVdSAqKwTRC23mZGC8xxUxaJAn YceynWP6ZN9QjiftBbEbwigVO+lOOAn8FcnfZWw0Ns45ICi3I9wGWNA+Oz6X2Z5VFsXR jJHY/toT+c/8zRBbgeNyC2IPjkdLOglX/1aybhu2RQzW51uVGTBl91plJURd72SQIYn0 PYcHyrQnoQgRCtlNPh2L3ZjQ6ixwB37G7WqiGmCsLEAopkI3TYOogswhJtEYB+uxhBHV ZWq+WzgKZycFw185Fk/Yt5xZzAezxwmMi/K2UYVBgop2l4422mNYNF+g7xssrDlkqiUx n6zw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774916025; x=1775520825; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RpnLXy+hBJm8JdRiolzfKaVO7LqeO5btXayYAxZdp3E=; b=A5fjw/konbnZwWBEvIgFiTNmJxhDsj3OQRnVJSScUoHrg1kVS4kc6ZysayRbQ9wnSB far1wX9oqHF4x+FPbnBm3St4KYS4Kb1cUgpxKBNsdbpRHzhx9sFjJLFeBrzBk0dLa/mI +QDWzBYluQGMUhHm5ADowCyzVakkr+GeWvqO68Pw8oFMu1kmbBFV2SPZ1P14/vdZSQI5 3EgO4eaArcqS9OCcFc1zSaJK1U/OrnHL1iweSUYRek0G9rEwNcHU/f6eMMAfhKa9jAw7 TdPK5xTQczwCVA29I52vTuHW8YY2h0esPBJrO3nyfZP0kJmxIKvGfbaoDouqN9HJFPiG c7yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774916025; x=1775520825; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=RpnLXy+hBJm8JdRiolzfKaVO7LqeO5btXayYAxZdp3E=; b=hVyhJas3Tsq2Kx6n7PbmLe11NgaiDIiVc44UrPmdyCT05FS1PaFnAYDEKF9tTL8Dtw pYqwcnxqblynO4pMipofltdqX3rM2L8RorOokmjCQCrwCNPgyPo2ENCSPQSmgVCQ3DVv XrRgHvETnUezH0evo93TojE37bhwSb5pjoGnnjCYTqFMKIiQbOkQdfZujg/CaErNGROz uDVZlGGWdu/v5rfHnQD/m6GezDg72/6oXvNkjw/1qppZIqgomdNPK/q/Yv1y+fSodjjW tM/tmZO6xZI36SIsMU2VJ2LzXJum9ZCJVmhvnBGC63MO0HnZcg6tNUrsRTW23VP3vgUR hU6Q== X-Forwarded-Encrypted: i=1; AJvYcCVZlTBeiGV5scsMyOMo7AnnP3tPTZ1dz3m1QPTXS3sqrM1SxQryzdtVXHo4DUBDfSeC0mE63tAz7d7yNC+b@postgresql.org X-Gm-Message-State: AOJu0Yz6inPteTvRmj3uzHl+3GPV2jco3ciVK90KtU9oBinKIMJVnqu4 rONkCKQn40qI8fWVQlqOkUeKNYbLW0pHWpy6WM7S/6sRPNdrULaEyH6oqshFTvQnf5BxGnNYile 1j2awPdtM0rh91CXhamkmRzjd56TGWQw= X-Gm-Gg: ATEYQzyK9sxHxTvhEWpkRnEQgFS9hp7JUN/lh98b98NSFnc4282DLcA9lehPYO7U9Ae FkM726A2SyMaP6NcwJjerTEpkhS8U8vTNNMwuMJHq34YXsVy3W1aDBrYzFamqgVeMe9I1n0rJjw 0nXB8TnjBjCz6V9XZYlzmDd/WN2GPXSKD5NmTyGrHDRq9maS7BGvWsE8R4kl6KMA1TQTJpJoO5U HZL1Cs8BPBYyNoMD4BfSsA+gWH9JZOVCDSq0r2PGd03fBWpqG4Mzby+prYUvCJOnyIiJ9jiCYVU jxIKSlBK X-Received: by 2002:a05:6512:3256:b0:5a2:afb0:8b8b with SMTP id 2adb3069b0e04-5a2afb08ea4mr2800667e87.18.1774916024412; Mon, 30 Mar 2026 17:13:44 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Masahiko Sawada Date: Mon, 30 Mar 2026 17:13:08 -0700 X-Gm-Features: AQROBzBxCtyERMyBY2Y8azg-sUdFLVu7TpnsZ_riFv_vFm9YQ7Hr0r-n69hBCAQ Message-ID: Subject: Re: Introduce XID age based replication slot invalidation To: Bharath Rupireddy Cc: Srinath Reddy Sadipiralla , SATYANARAYANA NARLAPURAM , "Hayato Kuroda (Fujitsu)" , John H , PostgreSQL-development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, Mar 29, 2026 at 6:35=E2=80=AFPM Bharath Rupireddy wrote: > > Hi, > > On Sun, Mar 29, 2026 at 1:16=E2=80=AFPM Srinath Reddy Sadipiralla > wrote: > > > > Hello, > > > > Thanks for the v5 patch set, I have reviewed and did initial testing on > > v5 patch set, and it LGTM, except these > > Thank you for reviewing and testing. I appreciate it. > > > diff --git a/src/backend/replication/slot.c b/src/backend/replication/s= lot.c > > index 286f0f46341..c2ff7e464f0 100644 > > --- a/src/backend/replication/slot.c > > +++ b/src/backend/replication/slot.c > > @@ -1849,7 +1849,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidatio= nCause cause, > > else > > { > > /* translator: %s is a GUC vari= able name */ > > - appendStringInfo(&err_detail, _= ("The slot's xmin %u is %d transactions old, which exceeds the configured \= "%s\" value of %d."), > > + appendStringInfo(&err_detail, _= ("The slot's catalog_xmin %u is %d transactions old, which exceeds the conf= igured \"%s\" value of %d."), > > = catalog_xmin, (int32) (recentXid - catalog_xmin), "max_slot_xid_age", max= _slot_xid_age); > > } > > Fixed the typo. > > > while testing the active slot XID age invalidation (SIGTERM path) , i > > observed that slot got invalidated , walsender was killed because of > > SIGTERM , then starts the infinite-retry-cycle problem where > > walreceiver starts walsender and walsender will try to use an invalidat= ed > > slot and dies, will think more on this. > > I would like to clarify that once a slot is invalidated due to any of > the reasons (ReplicationSlotInvalidationCause), it becomes unusable; > the sender will error out if the receiver tries to use it. This is > consistent with all existing slot invalidation mechanisms. > > Please find the attached v6 patches fixing the typo for further review. > I've reviewed the v6 patch. Here are some comments. bool vacuum_get_cutoffs(Relation rel, const VacuumParams params, - struct VacuumCutoffs *cutoffs) + struct VacuumCutoffs *cutoffs, + TransactionId *slot_xmin, + TransactionId *slot_catalog_xmin) How about storing both slot_xmin and catalog_xmin into VacuumCutoffs? --- - if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT, + possibleInvalidationCauses =3D RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIM= EOUT | + RS_INVAL_XID_AGE; + + if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses, _logSegNo, InvalidOid, + InvalidTransactionId, + max_slot_xid_age > 0 ? + ReadNextTransactionId() : InvalidTransactionId)) It's odd to me that we specify RS_INVAL_XID_AGE while passing InvalidTransactionId. I think we can specify RS_INVAL_XID_AGE along with a valid recentXId only when we'd like to check the slots based on their XIDs. --- + /* Check if the slot needs to be invalidated due to max_slot_xid_age GU= C */ + if ((possible_causes & RS_INVAL_XID_AGE) && CanInvalidateXidAgedSlot(s)= ) + { + TransactionId xidLimit; + + Assert(TransactionIdIsValid(recentXid)); + + xidLimit =3D TransactionIdRetreatedBy(recentXid, max_slot_xid_age); + I think we can avoid calculating xidLimit for every slot by calculating it in InvalidatePossiblyObsoleteSlot() and passing it to DetermineSlotInvalidationCause(). --- */ TransactionId GetOldestNonRemovableTransactionId(Relation rel) +{ + return GetOldestNonRemovableTransactionIdExt(rel, NULL, NULL); +} + +/* + * Same as GetOldestNonRemovableTransactionId(), but also returns the + * replication slot xmin and catalog_xmin from the same ComputeXidHorizons= () + * call. This avoids a separate ProcArrayLock acquisition when the caller + * needs both values. + */ +TransactionId +GetOldestNonRemovableTransactionIdExt(Relation rel, + TransactionId *slot_xmin, + TransactionId *slot_catalog_xmin) { I understand that the primary reason why the patch introduces another variant of GetOldestNonRemovableTransactionId() is to avoid extra ProcArrayLock acquision to get replication slot xmin and catalog_xmin. While it's not very elegant, I find that it would not be bad because otherwise autovacuum takes extra ProcArrayLock (in shared mode) for every table to vacuum. The ProcArrayLock is already known high-contented lock it would be better to avoid taking it once more. If others think differently, we can just call ProcArrayGetReplicationSlotXmin() separately and compare them to the limit of XID-age based slot invalidation. Having said that, I personally don't want to add new instructions to the existing GetOldestNonRemovableTransactionId(). I guess we might want to make both the existing function and new function call a common (inline) function that takes ComputeXidHorizonsResult and returns appropriate transaction id based on the given relation . --- + # Do some work to advance xids + $node->safe_psql( + 'postgres', qq[ + do \$\$ + begin + for i in 1..$nxids loop + -- use an exception block so that each iteration eats an XID + begin + insert into $table_name values (i); + exception + when division_by_zero then null; + end; + end loop; + end\$\$; + ]); I think it's fater to use pg_current_xact_id() instead. --- + else + { + $node->safe_psql('postgres', "VACUUM"); + } We don't need to vacuum all tables here. --- +# Configure primary with XID age settings. Set autovacuum_naptime high so +# that the checkpointer (not vacuum) triggers the invalidation. +my $max_slot_xid_age =3D 500; +$primary5->append_conf( + 'postgresql.conf', qq{ +max_slot_xid_age =3D $max_slot_xid_age +autovacuum_naptime =3D '1h' +}); I think that it's better to disable autovacuum than setting a large number. --- +# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_ag= e +# GUC (via checkpoint). I think that we can say "physical slot" instead of standby's slot to avoid confusion as I thought standby's slot is a slot created on the standby at the first glance. --- Do we have tests for invalidating slots on the standbys? Regards, --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com