Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVOe9-001vnD-2O for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 07:07:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVOe8-00AY3R-2E for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 07:07:12 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVOe7-00AY3J-2K for pgsql-hackers@lists.postgresql.org; Fri, 05 Jun 2026 07:07:12 +0000 Received: from forwardcorp1d.mail.yandex.net ([178.154.239.200]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wVOe2-00000001CDb-1Vd4 for pgsql-hackers@lists.postgresql.org; Fri, 05 Jun 2026 07:07:10 +0000 Received: from mail-nwsmtp-smtp-corp-main-68.klg.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-68.klg.yp-c.yandex.net [IPv6:2a02:6b8:c42:94a9:0:640:a3fa:0]) by forwardcorp1d.mail.yandex.net (Yandex) with ESMTPS id 6DA4880B48; Fri, 05 Jun 2026 10:07:01 +0300 (MSK) Received: from smtpclient.apple (unknown [2a02:6bf:8080:578::1:1d]) by mail-nwsmtp-smtp-corp-main-68.klg.yp-c.yandex.net (smtpcorp) with ESMTPSA id x6cDGD3XN0U0-970wjmXu; Fri, 05 Jun 2026 10:07:00 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1780643220; bh=1ZkregjaZHnwe4u9VBXjP9Xf32bnJTsNkfMalBUvFEA=; h=To:Cc:Date:Message-Id:From:Subject; b=E1q/ReogaBxtKpPixdazJ2h2bXjaY9Gp8f1+sK9zKLAjk9t0NfpSI9OtDurTmQTzs jYWQQkw/rJzViyIMJ4fTeUKX7sHwse2g9KAf3NZ6uPPWzHuxgVmma9Grf5nDRf378m 1j1ELHhmn4/rtmzJ1dOgZB2hA8dKjiihTaaWlDZY= Authentication-Results: mail-nwsmtp-smtp-corp-main-68.klg.yp-c.yandex.net; dkim=pass header.i=@yandex-team.ru From: Andrey Borodin Content-Type: multipart/mixed; boundary="Apple-Mail=_7CF3B2EF-A258-40FC-8984-55A2CDC2E2FE" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.600.51.1.1\)) Subject: Archive-fed logical decoding: pausing recovery on slot conflict Message-Id: <967DBEA9-E8B7-4705-AD36-447D839AACA9@yandex-team.ru> Date: Fri, 5 Jun 2026 12:06:48 +0500 Cc: Kirk Wolak , nik@postgres.ai To: pgsql-hackers mailing list X-Mailer: Apple Mail (2.3864.600.51.1.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_7CF3B2EF-A258-40FC-8984-55A2CDC2E2FE Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Hi hackers! We would like to get the community's opinion on an architecture for running logical decoding on a standby that is fed only by the WAL archive (restore_command), with no streaming link to the primary. The goal is continuous CDC into analytics systems straight from an archive, without adding load or a feedback dependency on the primary. Three AI-written commits are attached. They are meant to make the idea concrete and testable for this discussion, not as a finished patch yet. We made several iterations of editorialization and sharpening idea and implementation. The problem ----------- Logical decoding on a standby (PG16+) keeps the catalog readable for decoding by holding back catalog_xmin, and it relies on the primary holding its own catalog_xmin back via hot_standby_feedback over a streaming connection. An archive-fed standby has no walreceiver and therefore no feedback channel: the primary vacuums catalog tuples freely, ships the resulting WAL to the archive, and when the standby replays those records they conflict with the local logical slot's catalog_xmin. The slot is invalidated, in practice roughly 2 * autovacuum_naptime after it is created. As far as we can tell, this catalog conflict is the only fundamental blocker for archive-fed decoding. All such conflicts funnel through a single choke point, ResolveRecoveryConflictWithSnapshot(), which invalidates logical slots only for catalog relations: if (IsLogicalDecodingEnabled() && isCatalogRel) InvalidateObsoleteReplicationSlots(RS_INVAL_HORIZON, ...); The records that reach it with a catalog conflict horizon are: * Heap2 PRUNE records on catalog relations (PRUNE_ON_ACCESS, PRUNE_VACUUM_SCAN, PRUNE_VACUUM_CLEANUP; flag XLHP_IS_CATALOG_REL) * B-tree delete and page-reuse records on catalog indexes (isCatalogRel) System catalogs are indexed only with B-tree, so in practice the index side is always B-tree; the other AMs' vacuum records route through the same choke point but never carry a catalog horizon here. The remaining logical-slot invalidation causes are not fundamental to archive-fed decoding and are already in the DBA's hands: RS_INVAL_WAL_LEVEL (set wal_level=logical on the primary) and RS_INVAL_WAL_REMOVED (retain enough WAL on the standby, e.g. via max_slot_wal_keep_size). Both are pre-existing knobs an operator already manages. Please correct us if there is a second fundamental obstacle we have missed -- that is one of the main things we would like to confirm. The proposed approach --------------------- Instead of supplying the missing feedback, absorb the conflict on the consumer side. A new GUC, recovery_pause_on_logical_slot_conflict (default off), changes what happens at that choke point: when replay is about to invalidate an active logical slot, recovery pauses instead. Recovery resumes as soon as no slot still blocks the conflict. In the common case that happens the moment the consumer's decoding advances the slot's catalog_xmin past the conflict horizon, which can be well before the consumer reaches the pause LSN; only a slot that is still holding catalog_xmin back (e.g. a long-running decoded transaction) has to be drained all the way to the pause LSN. For any slot that did drain to the pause LSN, recovery advances its catalog_xmin past the horizon so the following InvalidateObsoleteReplicationSlots() is a no-op; replay then continues to the next conflict. * The hot path when the GUC is off is a single boolean early-return. * It reuses the existing SetRecoveryPause / recoveryNotPausedCV machinery; no new shared memory. * Auto-resume: a periodic re-scan lets replay continue the moment nothing blocks the conflict. A slot stops blocking when its catalog_xmin advances past the conflict horizon (the normal path, via the consumer decoding/confirming, often before the pause LSN), or when it drains past the pause LSN, is dropped, advanced, or invalidated for another reason. This lets it run as an unattended service. pg_wal_replay_resume() remains a manual "give up on this slot and let it invalidate" escape hatch, and pg_promote() still breaks out via CheckForStandbyTrigger(). * Crash safety: after advancing catalog_xmin in memory, dirty slots are flushed with CheckPointReplicationSlots(false) before replay proceeds, upholding the write-before-memory-update invariant that LogicalConfirmReceivedLocation already relies on. How we test it -------------- The in-tree TAP test (054_recovery_pause_on_slot_conflict.pl) builds a workload designed to break a logical slot, and checks that it breaks without the feature and survives with it: 1. Bring up an archive-only standby from a basebackup whose archive contains a standby snapshot but no catalog-prune WAL yet, and create a logical slot while it is still consistent. 2. Churn the primary's catalog (transient tables, ANALYZE, VACUUM of pg_class / pg_attribute / pg_statistic, etc.) so the archive then carries catalog-prune records whose horizon overtakes the slot. 3. Run two standbys from the same archive: with the GUC off the slot is invalidated (the upstream behaviour, and a check that the test actually reproduces the conflict); with the GUC on a drain-and- resume loop keeps the slot alive and decodes the full change stream. A third standby checks that an explicit operator pg_wal_replay_pause() is not cleared by the GUC's auto-resume. We also ran an end-to-end field test outside core, on a real archive-only standby recovering from object storage via WAL-G, with a pgbench workload plus deliberate catalog churn on the primary and a pg_recvlogical consumer on the standby. The consumer was forced to lag so the slot fell behind the prune horizon; recovery paused, the consumer caught up, recovery auto-resumed, and the full change stream arrived with no gaps -- while a GUC-off control standby lost its slot. Design and results [0]. Questions for the list ---------------------- 1. Direction. Is "pause recovery until the consumer catches up" an acceptable shape for this, or is the right long-term answer a feedback mechanism that does not require a streaming connection (e.g. an out-of-band way to publish a decoding standby's catalog_xmin back to the primary)? Pausing trades standby freshness for slot survival, which is fine for a dedicated decoding replica but bad for one that is also an HA target. 2. API. A cluster-wide GUC that stalls all of replay for the benefit of one slot is coarse. Would a per-slot property be cleaner -- e.g. a slot option that opts that slot into "hold recovery rather than invalidate me", so unrelated standbys and slots are unaffected? That also makes the backpressure explicit: a slow consumer on one slot deliberately holds the standby back. 3. Writing slot fields from the startup process. Advancing catalog_xmin during recovery, and flushing slots from the startup process, is new. Does the crash-safety argument above hold up, and are there concurrency concerns beyond the synced-slot and in-progress-snapbuild cases the commits already skip? 4. Fit with existing CDC installations. From a consumer's point of view (pg_recvlogical, Debezium, etc.) this looks like ordinary streaming from a standby with occasional stalls, so it should drop into existing pipelines. Is that the right integration point, or would operators prefer a separate "decoding from archive" tool that never runs a full standby at all? We are most interested in (1) and (2): whether this is the right layer to solve the problem, and whether the interface can be made narrower and less surprising than a global recovery pause. Best regards, Andrey, Kirk, Nik. [0] https://github.com/NikolayS/postgres/issues/43 --Apple-Mail=_7CF3B2EF-A258-40FC-8984-55A2CDC2E2FE Content-Disposition: attachment; filename=v1-0001-xlogrecovery-make-ConfirmRecoveryPaused-and-Check.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="v1-0001-xlogrecovery-make-ConfirmRecoveryPaused-and-Check.patch" Content-Transfer-Encoding: quoted-printable =46rom=2019515be1ec3818e4c2ae687dca00240e79b6747d=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20Nik=20Samokhvalov=20= =0ADate:=20Wed,=2027=20May=202026=2010:56:29=20= -0700=0ASubject:=20[PATCH=20v1=201/3]=20xlogrecovery:=20make=20= ConfirmRecoveryPaused=20and=0A=20CheckForStandbyTrigger=20extern=0A= MIME-Version:=201.0=0AContent-Type:=20text/plain;=20charset=3DUTF-8=0A= Content-Transfer-Encoding:=208bit=0A=0AMaybePauseOnLogicalSlotConflict=20= (introduced=20in=20the=20next=20commit)=20runs=0Ainside=20= ResolveRecoveryConflictWithSnapshot,=20which=20is=20called=20from=20the=0A= WAL=20apply=20path=20rather=20than=20the=20main=20recovery=20loop.=20=20= Its=20wait=20loop=20must=0Ado=20the=20same=20two=20things=20= recoveryPausesHere()=20does:=0A=0A=20=201.=20Transition=20= RECOVERY_PAUSE_REQUESTED=20->=20RECOVERY_PAUSED=20so=20that=0A=20=20=20=20= =20pg_wal_replay_resume()=20can=20release=20the=20pause.=0A=20=202.=20= Check=20for=20a=20promote=20signal=20so=20that=20pg_promote()=20does=20= not=20stall=0A=20=20=20=20=20while=20the=20startup=20process=20is=20= sleeping=20inside=20the=20slot-conflict=20wait.=0A=0ABoth=20are=20= currently=20static.=20=20Remove=20the=20static=20qualifier=20and=20add=20= extern=0Adeclarations=20to=20xlogrecovery.h=20so=20standby.c=20can=20= call=20them.=0A=0ANo=20behaviour=20change=20=E2=80=94=20only=20= visibility=20changes.=0A=0ACo-Authored-By:=20Claude=20Sonnet=204.6=20= =0A---=0A=20= src/backend/access/transam/xlogrecovery.c=20|=2010=20++++++----=0A=20= src/include/access/xlogrecovery.h=20=20=20=20=20=20=20=20=20|=20=202=20= ++=0A=202=20files=20changed,=208=20insertions(+),=204=20deletions(-)=0A=0A= diff=20--git=20a/src/backend/access/transam/xlogrecovery.c=20= b/src/backend/access/transam/xlogrecovery.c=0Aindex=20= 73b78a83fa7..80282e4689e=20100644=0A---=20= a/src/backend/access/transam/xlogrecovery.c=0A+++=20= b/src/backend/access/transam/xlogrecovery.c=0A@@=20-363,7=20+363,8=20@@=20= static=20bool=20recoveryStopsAfter(XLogReaderState=20*record);=0A=20= static=20char=20*getRecoveryStopReason(void);=0A=20static=20void=20= recoveryPausesHere(bool=20endOfRecovery);=0A=20static=20bool=20= recoveryApplyDelay(XLogReaderState=20*record);=0A-static=20void=20= ConfirmRecoveryPaused(void);=0A+/*=20Exposed=20for=20the=20= logical-slot-conflict=20recovery-pause=20logic=20in=20standby.c.=20*/=0A= +void=20ConfirmRecoveryPaused(void);=0A=20=0A=20static=20XLogRecord=20= *ReadRecord(XLogPrefetcher=20*xlogprefetcher,=0A=20=09=09=09=09=09=09=09=20= =20int=20emode,=20bool=20fetching_ckpt,=0A@@=20-386,7=20+387,8=20@@=20= static=20int=09XLogFileRead(XLogSegNo=20segno,=20TimeLineID=20tli,=0A=20=09= =09=09=09=09=09=20XLogSource=20source,=20bool=20notfoundOk);=0A=20static=20= int=09XLogFileReadAnyTLI(XLogSegNo=20segno,=20XLogSource=20source);=0A=20= =0A-static=20bool=20CheckForStandbyTrigger(void);=0A+/*=20Exposed=20for=20= the=20logical-slot-conflict=20recovery-pause=20logic=20in=20standby.c.=20= */=0A+bool=20CheckForStandbyTrigger(void);=0A=20static=20void=20= SetPromoteIsTriggered(void);=0A=20static=20bool=20= HotStandbyActiveInReplay(void);=0A=20=0A@@=20-3083,7=20+3085,7=20@@=20= SetRecoveryPause(bool=20recoveryPause)=0A=20=20*=20Confirm=20the=20= recovery=20pause=20by=20setting=20the=20recovery=20pause=20state=20to=0A=20= =20*=20RECOVERY_PAUSED.=0A=20=20*/=0A-static=20void=0A+void=0A=20= ConfirmRecoveryPaused(void)=0A=20{=0A=20=09/*=20If=20recovery=20pause=20= is=20requested=20then=20set=20it=20paused=20*/=0A@@=20-4438,7=20+4440,7=20= @@=20SetPromoteIsTriggered(void)=0A=20/*=0A=20=20*=20Check=20whether=20a=20= promote=20request=20has=20arrived.=0A=20=20*/=0A-static=20bool=0A+bool=0A= =20CheckForStandbyTrigger(void)=0A=20{=0A=20=09if=20= (LocalPromoteIsTriggered)=0Adiff=20--git=20= a/src/include/access/xlogrecovery.h=20= b/src/include/access/xlogrecovery.h=0Aindex=20ba7750dca0b..1683ec14a5a=20= 100644=0A---=20a/src/include/access/xlogrecovery.h=0A+++=20= b/src/include/access/xlogrecovery.h=0A@@=20-213,6=20+213,8=20@@=20extern=20= bool=20HotStandbyActive(void);=0A=20extern=20XLogRecPtr=20= GetXLogReplayRecPtr(TimeLineID=20*replayTLI);=0A=20extern=20= RecoveryPauseState=20GetRecoveryPauseState(void);=0A=20extern=20void=20= SetRecoveryPause(bool=20recoveryPause);=0A+extern=20void=20= ConfirmRecoveryPaused(void);=0A+extern=20bool=20= CheckForStandbyTrigger(void);=0A=20extern=20void=20= GetXLogReceiptTime(TimestampTz=20*rtime,=20bool=20*fromStream);=0A=20= extern=20TimestampTz=20GetLatestXTime(void);=0A=20extern=20TimestampTz=20= GetCurrentChunkReplayStartTime(void);=0A--=20=0A2.50.1=20(Apple=20= Git-155)=0A=0A= --Apple-Mail=_7CF3B2EF-A258-40FC-8984-55A2CDC2E2FE Content-Disposition: attachment; filename=v1-0003-Auto-resume-recovery-once-the-logical-slot-confli.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="v1-0003-Auto-resume-recovery-once-the-logical-slot-confli.patch" Content-Transfer-Encoding: quoted-printable =46rom=2044eceb392066607a497e109eb5f0f8a42e4a658e=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20Claude=20=0ADate:=20= Wed,=2022=20Apr=202026=2018:05:46=20+0000=0ASubject:=20[PATCH=20v1=20= 3/3]=20Auto-resume=20recovery=20once=20the=20logical=20slot=20conflict=20= is=0A=20resolved=0AMIME-Version:=201.0=0AContent-Type:=20text/plain;=20= charset=3DUTF-8=0AContent-Transfer-Encoding:=208bit=0A=0AThe=20previous=20= behavior=20under=20recovery_pause_on_logical_slot_conflict=0Arequired=20= the=20operator=20to=20both=20drain=20(or=20drop=20/=20advance)=20the=20= slot=20AND=0Acall=20pg_wal_replay_resume()=20to=20continue=20=E2=80=94=20= two=20steps,=20even=20though=20the=0Afirst=20step=20is=20the=20one=20= that=20matters=20semantically.=20That=20split=20also=20meant=0Athe=20= feature=20couldn't=20underpin=20a=20continuous-CDC=20service=20without=0A= external=20orchestration=20to=20issue=20the=20resume.=0A=0ALift=20the=20= scan=20predicate=20("does=20any=20slot=20in=20`dboid`=20still=20block=20= this=0Aconflict?")=20out=20of=20the=20initial=20check=20into=20a=20= helper=0AAnySlotStillBlocksConflict().=20Call=20it=20again=20every=201s=20= inside=20the=0Aexisting=20wait=20loop.=20When=20it=20returns=20false,=20= flip=20the=20pause=20state=20to=0ANOT_PAUSED=20and=20let=20the=20loop=20= exit;=20the=20existing=20post-wait=20advance=20then=0Abumps=20= catalog_xmin=20past=20the=20horizon=20on=20drained=20slots=20so=20the=0A= fall-through=20InvalidateObsoleteReplicationSlots()=20is=20a=20no-op.=0A=0A= "No=20longer=20blocking"=20covers=20every=20unblock=20path,=20not=20just=20= drain:=0A=0A=20=20*=20drained=20past=20the=20pause=20LSN=20= (confirmed_flush=20>=3D=20captured=0A=20=20=20=20conflict_lsn)=20=E2=80=94= =20the=20main=20case=0A=20=20*=20slot=20dropped=20= (pg_drop_replication_slot)=20=E2=80=94=20removed=20from=20the=20scan=0A=20= =20*=20slot=20advanced=20(pg_replication_slot_advance)=20=E2=80=94=20= catalog_xmin=20moves=0A=20=20=20=20past=20the=20horizon=0A=20=20*=20slot=20= invalidated=20for=20another=20reason=20(e.g.=20RS_INVAL_WAL_REMOVED=0A=20= =20=20=20from=20max_slot_wal_keep_size,=20applied=20by=20the=20= checkpointer,=20which=0A=20=20=20=20runs=20even=20while=20the=20startup=20= process=20is=20asleep=20in=20our=20wait=20loop)=0A=20=20=20=20=E2=80=94=20= data.invalidated=20!=3D=20RS_INVAL_NONE,=20scan=20skips=20it=0A=0AManual=20= pg_wal_replay_resume()=20still=20works=20as=20the=20"give=20up=20on=20= this=0Aslot=20and=20let=20it=20invalidate"=20escape=20hatch,=20and=20= CheckForStandbyTrigger=0Astill=20breaks=20the=20loop=20for=20= pg_promote().=0A=0ACapture=20conflict_lsn=20once=20at=20pause=20time=20= and=20reuse=20it=20for=20both=20the=0Ain-wait=20predicate=20and=20the=20= post-wait=20advance,=20replacing=20the=20redundant=0Asecond=20= GetXLogReplayRecPtr()=20call.=0A=0AGUC=20long_desc,=20= postgresql.conf.sample=20comment,=20and=20the=20xlogrecovery.c=0A= variable-decl=20comment=20updated=20to=20describe=20auto-resume.=0A---=0A= =20src/backend/access/transam/xlogrecovery.c=20=20=20=20=20|=20=20=205=20= +-=0A=20src/backend/storage/ipc/standby.c=20=20=20=20=20=20=20=20=20=20=20= =20=20|=20216=20+++++++++++-------=0A=20= src/backend/utils/misc/guc_parameters.dat=20=20=20=20=20|=20=20=202=20+-=0A= =20src/backend/utils/misc/postgresql.conf.sample=20|=20=20=204=20+-=0A=20= .../t/054_recovery_pause_on_slot_conflict.pl=20=20|=20120=20+++++++++-=0A= =205=20files=20changed,=20255=20insertions(+),=2092=20deletions(-)=0A=0A= diff=20--git=20a/src/backend/access/transam/xlogrecovery.c=20= b/src/backend/access/transam/xlogrecovery.c=0Aindex=20= 508e718169c..a2b0fd4ef12=20100644=0A---=20= a/src/backend/access/transam/xlogrecovery.c=0A+++=20= b/src/backend/access/transam/xlogrecovery.c=0A@@=20-100,7=20+100,10=20@@=20= int=09=09=09recovery_min_apply_delay=20=3D=200;=0A=20=20*=20If=20true,=20= when=20WAL=20replay=20on=20a=20standby=20is=20about=20to=20invalidate=20= an=20otherwise-=0A=20=20*=20active=20logical=20replication=20slot=20= because=20a=20catalog=20PRUNE_ON_ACCESS=20record's=0A=20=20*=20= snapshotConflictHorizon=20has=20overtaken=20the=20slot's=20catalog_xmin,=20= pause=20replay=0A-=20*=20instead=20and=20give=20an=20operator=20a=20= chance=20to=20drain=20(or=20drop)=20the=20slot.=0A+=20*=20instead.=20= Replay=20auto-resumes=20once=20the=20consumer=20has=20drained=20the=20= slot=20past=0A+=20*=20the=20pause=20point=20(or=20the=20slot=20is=20= dropped,=20advanced,=20or=20otherwise=20no=20longer=0A+=20*=20blocking);=20= pg_wal_replay_resume()=20also=20forces=20continuation.=20See=0A+=20*=20= MaybePauseOnLogicalSlotConflict()=20in=20standby.c.=0A=20=20*=0A=20=20*=20= Motivated=20by=20blueprints/LOGICAL_DECODING_ARCHIVED_WALS.md=20=C2=A74.2.= 3=20/=20US-4:=0A=20=20*=20an=20archive-only=20logical-decoding=20standby=20= cannot=20feed=20hot_standby_feedback=0Adiff=20--git=20= a/src/backend/storage/ipc/standby.c=20= b/src/backend/storage/ipc/standby.c=0Aindex=200659f9d2169..ce467a07486=20= 100644=0A---=20a/src/backend/storage/ipc/standby.c=0A+++=20= b/src/backend/storage/ipc/standby.c=0A@@=20-514,51=20+514,37=20@@=20= ResolveRecoveryConflictWithSnapshot(TransactionId=20= snapshotConflictHorizon,=0A=20}=0A=20=0A=20/*=0A-=20*=20If=20= recovery_pause_on_logical_slot_conflict=20is=20enabled,=20and=20replay=20= is=20about=0A-=20*=20to=20apply=20a=20catalog=20PRUNE_ON_ACCESS=20record=20= whose=20snapshotConflictHorizon=0A-=20*=20would=20cause=20the=20= invalidation=20of=20at=20least=20one=20non-invalidated=20logical=20slot=0A= -=20*=20in=20the=20same=20database,=20request=20a=20recovery=20pause=20= and=20wait=20on=20the=20recovery=0A-=20*=20pause=20condition=20variable=20= until=20an=20operator=20resumes.=0A+=20*=20Returns=20true=20if=20at=20= least=20one=20non-synced=20logical=20slot=20in=20`dboid`=20still=0A+=20*=20= blocks=20replay=20past=20snapshotConflictHorizon.=0A=20=20*=0A-=20*=20On=20= resume=20the=20caller=20re-falls=20through=20to=20= InvalidateObsoleteReplicationSlots:=0A-=20*=20if=20the=20operator=20has=20= drained=20/=20dropped=20/=20advanced=20the=20slot,=20invalidation=20is=0A= -=20*=20a=20no-op;=20if=20they=20chose=20to=20resume=20without=20acting,=20= the=20slot=20is=20invalidated=0A-=20*=20as=20usual.=20This=20matches=20= the=20recovery_target_action=3Dpause=20precedent.=0A+=20*=20"Blocks"=20= means:=20the=20slot=20is=20in=20use,=20not=20invalidated,=20= snapbuild-consistent=0A+=20*=20(effective_catalog_xmin=20is=20valid=20= =E2=80=94=20skipping=20in-progress=20slots=20avoids=20a=0A+=20*=20= deadlock=20with=20DecodingContextFindStartpoint),=20and=20its=20= catalog_xmin=0A+=20*=20precedes-or-equals=20the=20horizon.=0A=20=20*=0A-=20= *=20The=20two=20parameters=20identify=20which=20slots,=20if=20any,=20= this=20prune=20record=20can=0A-=20*=20conflict=20with:=0A-=20*=20=20=20-=20= dboid:=20logical=20slots=20are=20per-database,=20so=20only=20slots=20= belonging=20to=20this=0A-=20*=20=20=20=20=20database=20can=20be=20= invalidated=20by=20a=20catalog=20prune=20happening=20here;=20slots=20in=0A= -=20*=20=20=20=20=20other=20databases=20are=20never=20affected=20and=20= must=20be=20ignored.=0A-=20*=20=20=20-=20snapshotConflictHorizon:=20the=20= xid=20threshold=20carried=20by=20the=0A-=20*=20=20=20=20=20= PRUNE_ON_ACCESS=20record.=20A=20slot=20conflicts=20iff=20its=20= catalog_xmin=0A-=20*=20=20=20=20=20precedes-or-equals=20this=20horizon=20= (i.e.=20it=20still=20needs=20catalog=20rows=20the=0A-=20*=20=20=20=20=20= prune=20is=20about=20to=20remove).=0A+=20*=20Use=20PrecedesOrEquals=20= (not=20Precedes)=20to=20match=20DetermineSlotInvalidationCause.=0A+=20*=20= Otherwise=20a=20slot=20whose=20catalog_xmin=20was=20just=20advanced=20to=20= exactly=20horizon=20by=0A+=20*=20a=20previous=20pause-and-advance=20= cycle=20fails=20to=20re-pause=20on=20the=20next=20prune=0A+=20*=20record=20= with=20the=20same=20horizon,=20yet=20would=20still=20be=20invalidated=20= by=20the=0A+=20*=20fall-through=20InvalidateObsoleteReplicationSlots=20= call.=0A=20=20*=0A-=20*=20Only=20invoked=20from=20= ResolveRecoveryConflictWithSnapshot(),=20before=20any=20buffer=0A-=20*=20= locks=20are=20taken,=20so=20pausing=20here=20does=20not=20deadlock=20= with=20anything.=0A+=20*=20Synced=20slots=20are=20skipped:=20writing=20= their=20fields=20from=20the=20startup=20process=0A+=20*=20would=20race=20= the=20slot-sync=20worker,=20and=20ALTER=20/=20DROP_REPLICATION_SLOT=20= errors=0A+=20*=20out=20on=20a=20synced=20slot=20so=20the=20= operator-facing=20recipe=20does=20not=20apply.=0A+=20*=0A+=20*=20When=20= conflict_lsn=20is=20valid=20(in-wait=20auto-resume=20check),=20slots=20= whose=0A+=20*=20confirmed_flush_lsn=20has=20reached=20conflict_lsn=20are=20= treated=20as=20not=20blocking:=0A+=20*=20the=20consumer=20has=20caught=20= up=20to=20the=20pause=20point=20and=20the=20post-wait=20advance=0A+=20*=20= code=20will=20bump=20their=20catalog_xmin=20past=20the=20horizon.=20Pass=20= InvalidXLogRecPtr=0A+=20*=20for=20the=20initial=20pause-or-not=20= decision=20(we=20don't=20yet=20have=20a=20pause=20point).=0A=20=20*/=0A= -void=0A-MaybePauseOnLogicalSlotConflict(Oid=20dboid,=20TransactionId=20= snapshotConflictHorizon)=0A+static=20bool=0A= +AnySlotStillBlocksConflict(Oid=20dboid,=20TransactionId=20= snapshotConflictHorizon,=0A+=09=09=09=09=09=09=20=20=20XLogRecPtr=20= conflict_lsn)=0A=20{=0A=20=09int=09=09=09i;=0A-=09bool=09=09= would_invalidate=20=3D=20false;=0A-=0A-=09if=20= (!recovery_pause_on_logical_slot_conflict)=0A-=09=09return;=0A-=09if=20= (!TransactionIdIsValid(snapshotConflictHorizon))=0A-=09=09return;=0A+=09= bool=09=09blocking=20=3D=20false;=0A=20=0A-=09/*=0A-=09=20*=20Scan=20for=20= a=20would-be-invalidated=20slot=20in=20the=20conflicting=20database.=0A-=09= =20*=0A-=09=20*=20Skip=20slots=20that=20have=20not=20yet=20reached=20= snapshot-builder=20consistency=0A-=09=20*=20(effective_catalog_xmin=20is=20= still=20InvalidTransactionId).=20An=20in-progress=0A-=09=20*=20slot=20= has=20not=20produced=20any=20output=20to=20a=20consumer,=20so=20= invalidating=20it=20is=0A-=09=20*=20harmless=20=E2=80=94=20the=20caller=20= can=20retry.=20Pausing=20for=20such=20a=20slot=20would=0A-=09=20*=20= deadlock:=20DecodingContextFindStartpoint=20would=20be=20waiting=20for=20= replay=0A-=09=20*=20to=20advance,=20while=20replay=20would=20be=20= waiting=20for=20the=20slot=20to=20be=20drained.=0A-=09=20*/=0A=20=09= LWLockAcquire(ReplicationSlotControlLock,=20LW_SHARED);=0A=20=09for=20(i=20= =3D=200;=20i=20<=20max_replication_slots;=20i++)=0A=20=09{=0A@@=20-566,7=20= +552,8=20@@=20MaybePauseOnLogicalSlotConflict(Oid=20dboid,=20= TransactionId=20snapshotConflictHorizon=0A=20=09=09Oid=09=09=09slot_db;=0A= =20=09=09TransactionId=20slot_xmin;=0A=20=09=09TransactionId=20= slot_effective_xmin;=0A-=09=09bool=09=09active_logical;=0A+=09=09= XLogRecPtr=09slot_confirmed;=0A+=09=09bool=09=09is_candidate;=0A=20=0A=20= =09=09if=20(!s->in_use)=0A=20=09=09=09continue;=0A@@=20-575,52=20+562,99=20= @@=20MaybePauseOnLogicalSlotConflict(Oid=20dboid,=20TransactionId=20= snapshotConflictHorizon=0A=20=09=09slot_db=20=3D=20s->data.database;=0A=20= =09=09slot_xmin=20=3D=20s->data.catalog_xmin;=0A=20=09=09= slot_effective_xmin=20=3D=20s->effective_catalog_xmin;=0A-=09=09/*=0A-=09= =09=20*=20Skip=20synced=20slots=20(managed=20by=20the=20slot-sync=20= worker=20per=0A-=09=09=20*=20sync_replication_slots).=20Writing=20their=20= fields=20from=20the=20startup=0A-=09=09=20*=20process=20would=20race=20= with=20the=20slot-sync=20worker's=20own=20updates,=20and=0A-=09=09=20*=20= the=20operator-facing=20"drain=20or=20drop=20the=20slot"=20recipe=20in=20= the=0A-=09=09=20*=20errhint=20below=20cannot=20be=20applied=20to=20a=20= synced=20slot=20(ALTER=20/=0A-=09=09=20*=20DROP_REPLICATION_SLOT=20error=20= on=20synced).=0A-=09=09=20*/=0A-=09=09active_logical=20=3D=20= (s->data.invalidated=20=3D=3D=20RS_INVAL_NONE=20&&=0A-=09=09=09=09=09=09=20= =20slot_db=20!=3D=20InvalidOid=20&&=0A-=09=09=09=09=09=09=20=20= TransactionIdIsValid(slot_effective_xmin)=20&&=0A-=09=09=09=09=09=09=20=20= !s->data.synced);=0A+=09=09slot_confirmed=20=3D=20= s->data.confirmed_flush;=0A+=09=09is_candidate=20=3D=20= (s->data.invalidated=20=3D=3D=20RS_INVAL_NONE=20&&=0A+=09=09=09=09=09=09= slot_db=20!=3D=20InvalidOid=20&&=0A+=09=09=09=09=09=09= TransactionIdIsValid(slot_effective_xmin)=20&&=0A+=09=09=09=09=09=09= !s->data.synced);=0A=20=09=09SpinLockRelease(&s->mutex);=0A=20=0A-=09=09= if=20(!active_logical)=0A+=09=09if=20(!is_candidate)=0A=20=09=09=09= continue;=0A=20=09=09if=20(slot_db=20!=3D=20dboid)=0A=20=09=09=09= continue;=0A=20=09=09if=20(!TransactionIdIsValid(slot_xmin))=0A=20=09=09=09= continue;=0A-=09=09/*=0A-=09=09=20*=20Use=20PrecedesOrEquals=20(not=20= Precedes)=20to=20match=20the=20check=20in=0A-=09=09=20*=20= DetermineSlotInvalidationCause.=20Otherwise=20a=20slot=20whose=0A-=09=09=20= *=20catalog_xmin=20was=20just=20advanced=20to=20exactly=20= conflict_horizon=20by=0A-=09=09=20*=20a=20previous=20pause-and-advance=20= cycle=20(our=20own=20resume=20code)=20will=0A-=09=09=20*=20NOT=20trigger=20= a=20pause=20here=20when=20the=20next=20prune=20record=20arrives=0A-=09=09= =20*=20with=20horizon=20=3D=3D=20catalog_xmin,=20yet=20WILL=20still=20be=20= invalidated=0A-=09=09=20*=20by=20the=20fall-through=20= InvalidateObsoleteReplicationSlots=20call.=0A-=09=09=20*/=0A-=09=09if=20= (TransactionIdPrecedesOrEquals(slot_xmin,=20snapshotConflictHorizon))=0A= -=09=09{=0A-=09=09=09would_invalidate=20=3D=20true;=0A-=09=09=09break;=0A= -=09=09}=0A+=09=09if=20(!TransactionIdPrecedesOrEquals(slot_xmin,=20= snapshotConflictHorizon))=0A+=09=09=09continue;=0A+=09=09if=20= (conflict_lsn=20!=3D=20InvalidXLogRecPtr=20&&=0A+=09=09=09slot_confirmed=20= >=3D=20conflict_lsn)=0A+=09=09=09continue;=0A+=0A+=09=09blocking=20=3D=20= true;=0A+=09=09break;=0A=20=09}=0A=20=09= LWLockRelease(ReplicationSlotControlLock);=0A=20=0A-=09if=20= (!would_invalidate)=0A+=09return=20blocking;=0A+}=0A+=0A+/*=0A+=20*=20If=20= recovery_pause_on_logical_slot_conflict=20is=20enabled,=20and=20replay=20= is=20about=0A+=20*=20to=20apply=20a=20catalog=20PRUNE_ON_ACCESS=20record=20= whose=20snapshotConflictHorizon=0A+=20*=20would=20cause=20the=20= invalidation=20of=20at=20least=20one=20non-invalidated=20logical=20slot=0A= +=20*=20in=20the=20same=20database,=20request=20a=20recovery=20pause=20= and=20wait=20until=20the=20conflict=0A+=20*=20is=20resolved.=0A+=20*=0A+=20= *=20The=20wait=20exits=20in=20any=20of:=0A+=20*=20=20=20-=20Auto-resume:=20= a=20periodic=20re-scan=20finds=20no=20slot=20still=20blocking.=20Any=20= of=0A+=20*=20=20=20=20=20draining=20past=20the=20pause=20LSN,=20dropping=20= the=20slot,=20pg_replication_slot_=0A+=20*=20=20=20=20=20advance(),=20or=20= out-of-band=20invalidation=20(e.g.=20max_slot_wal_keep_size=0A+=20*=20=20= =20=20=20applied=20by=20the=20checkpointer,=20which=20runs=20even=20= while=20startup=20is=20paused=0A+=20*=20=20=20=20=20here)=20will=20= satisfy=20this.=20The=20post-wait=20advance=20then=20bumps=20= catalog_xmin=0A+=20*=20=20=20=20=20on=20drained=20slots=20so=20the=20= fall-through=20InvalidateObsoleteReplicationSlots()=0A+=20*=20=20=20=20=20= is=20a=20no-op.=0A+=20*=20=20=20-=20Manual=20resume:=20= pg_wal_replay_resume()=20flips=20the=20state=20to=20NOT_PAUSED.=0A+=20*=20= =20=20=20=20Any=20slot=20still=20blocking=20at=20that=20point=20is=20= invalidated=20by=20the=0A+=20*=20=20=20=20=20fall-through=20=E2=80=94=20= the=20"give=20up=20on=20this=20slot"=20escape=20hatch.=0A+=20*=20=20=20-=20= Promote:=20CheckForStandbyTrigger()=20consumes=20PROMOTE_SIGNAL_FILE=20= and=20we=0A+=20*=20=20=20=20=20return=20early=20so=20the=20startup=20= process=20can=20finish=20promotion.=0A+=20*=0A+=20*=20The=20two=20= parameters=20identify=20which=20slots,=20if=20any,=20this=20prune=20= record=20can=0A+=20*=20conflict=20with:=0A+=20*=20=20=20-=20dboid:=20= logical=20slots=20are=20per-database,=20so=20only=20slots=20belonging=20= to=20this=0A+=20*=20=20=20=20=20database=20can=20be=20invalidated=20by=20= a=20catalog=20prune=20happening=20here;=20slots=20in=0A+=20*=20=20=20=20=20= other=20databases=20are=20never=20affected=20and=20must=20be=20ignored.=0A= +=20*=20=20=20-=20snapshotConflictHorizon:=20the=20xid=20threshold=20= carried=20by=20the=0A+=20*=20=20=20=20=20PRUNE_ON_ACCESS=20record.=20A=20= slot=20conflicts=20iff=20its=20catalog_xmin=0A+=20*=20=20=20=20=20= precedes-or-equals=20this=20horizon=20(i.e.=20it=20still=20needs=20= catalog=20rows=20the=0A+=20*=20=20=20=20=20prune=20is=20about=20to=20= remove).=0A+=20*=0A+=20*=20Only=20invoked=20from=20= ResolveRecoveryConflictWithSnapshot(),=20before=20any=20buffer=0A+=20*=20= locks=20are=20taken,=20so=20pausing=20here=20does=20not=20deadlock=20= with=20anything.=0A+=20*/=0A+void=0A+MaybePauseOnLogicalSlotConflict(Oid=20= dboid,=20TransactionId=20snapshotConflictHorizon)=0A+{=0A+=09XLogRecPtr=09= conflict_lsn;=0A+=09bool=09=09user_requested_pause;=0A+=0A+=09if=20= (!recovery_pause_on_logical_slot_conflict)=0A+=09=09return;=0A+=09if=20= (!TransactionIdIsValid(snapshotConflictHorizon))=0A+=09=09return;=0A+=0A= +=09if=20(!AnySlotStillBlocksConflict(dboid,=20snapshotConflictHorizon,=0A= +=09=09=09=09=09=09=09=09=09InvalidXLogRecPtr))=0A=20=09=09return;=0A=20=0A= +=09/*=0A+=09=20*=20Remember=20whether=20an=20operator=20had=20already=20= paused=20recovery=20(e.g.=20via=0A+=09=20*=20pg_wal_replay_pause())=20= before=20this=20conflict=20fired.=20If=20so,=20our=0A+=09=20*=20= auto-resume=20below=20must=20not=20clear=20that=20pause=20out=20from=20= under=20them=20=E2=80=94=20the=0A+=09=20*=20user's=20pause=20wins.=0A+=09= =20*/=0A+=09user_requested_pause=20=3D=20(GetRecoveryPauseState()=20!=3D=20= RECOVERY_NOT_PAUSED);=0A+=0A+=09conflict_lsn=20=3D=20= GetXLogReplayRecPtr(NULL);=0A+=0A=20=09ereport(LOG,=0A=20=09=09=09= (errmsg("recovery=20paused:=20WAL=20redo=20at=20%X/%X=20would=20= invalidate=20a=20logical=20replication=20slot",=0A-=09=09=09=09=09= LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))),=0A+=09=09=09=09=09= LSN_FORMAT_ARGS(conflict_lsn)),=0A=20=09=09=09=20= errdetail("snapshotConflictHorizon=20%u=20exceeds=20catalog_xmin=20of=20= at=20least=20one=20active=20logical=20slot=20in=20database=20%u.",=0A=20=09= =09=09=09=09=20=20=20snapshotConflictHorizon,=20dboid),=0A-=09=09=09=20= errhint("Drain,=20advance,=20or=20drop=20the=20slot,=20then=20execute=20= pg_wal_replay_resume().")));=0A+=09=09=09=20errhint("Recovery=20will=20= resume=20automatically=20once=20the=20slot=20is=20drained=20past=20= %X/%X,=20dropped,=20advanced,=20or=20invalidated=20for=20another=20= reason;=20pg_wal_replay_resume()=20forces=20continuation=20(invalidating=20= any=20remaining=20blocking=20slot).",=0A+=09=09=09=09=09=20= LSN_FORMAT_ARGS(conflict_lsn))));=0A=20=0A=20=09SetRecoveryPause(true);=0A= =20=0A@@=20-642,6=20+676,29=20@@=20MaybePauseOnLogicalSlotConflict(Oid=20= dboid,=20TransactionId=20snapshotConflictHorizon=0A=20=09=09=09return;=0A= =20=09=09}=0A=20=0A+=09=09/*=0A+=09=09=20*=20Auto-resume:=20if=20nothing=20= is=20still=20blocking=20this=20conflict,=20clear=0A+=09=09=20*=20the=20= pause=20and=20let=20the=20loop=20condition=20exit.=20The=20post-wait=20= advance=0A+=09=09=20*=20will=20bump=20catalog_xmin=20on=20any=20slot=20= that=20drained=20past=20conflict_lsn=0A+=09=09=20*=20so=20the=20= fall-through=20InvalidateObsoleteReplicationSlots()=20is=20a=0A+=09=09=20= *=20no-op.=20Slots=20invalidated=20out=20of=20band=20(dropped,=20= WAL-removed,=0A+=09=09=20*=20etc.)=20are=20simply=20not=20in=20the=20= scan=20anymore.=0A+=09=09=20*/=0A+=09=09if=20= (!AnySlotStillBlocksConflict(dboid,=20snapshotConflictHorizon,=0A+=09=09=09= =09=09=09=09=09=09=09conflict_lsn))=0A+=09=09{=0A+=09=09=09/*=0A+=09=09=09= =20*=20Only=20clear=20the=20pause=20we=20set=20ourselves.=20If=20the=20= operator=20had=0A+=09=09=09=20*=20already=20paused=20recovery=20before=20= the=20conflict=20fired,=20leave=20their=0A+=09=09=09=20*=20pause=20in=20= place=20=E2=80=94=20auto-resume=20must=20not=20silently=20override=20an=0A= +=09=09=09=20*=20explicit=20pg_wal_replay_pause().=20Either=20way,=20= exit=20the=0A+=09=09=09=20*=20conflict-wait=20loop=20now=20that=20= nothing=20is=20blocking.=0A+=09=09=09=20*/=0A+=09=09=09if=20= (!user_requested_pause)=0A+=09=09=09=09SetRecoveryPause(false);=0A+=09=09= =09break;=0A+=09=09}=0A+=0A=20=09=09/*=0A=20=09=09=20*=20Promote=20= RECOVERY_PAUSE_REQUESTED=20to=20RECOVERY_PAUSED=20so=20that=0A=20=09=09=20= *=20observers=20(pg_get_wal_replay_pause_state()=20/=20monitoring)=20see=20= the=0A@@=20-654,21=20+711,14=20@@=20MaybePauseOnLogicalSlotConflict(Oid=20= dboid,=20TransactionId=20snapshotConflictHorizon=0A=20=09= ConditionVariableCancelSleep();=0A=20=0A=20=09/*=0A-=09=20*=20Operator=20= has=20resumed.=20If=20they=20drained=20slot(s)=20up=20to=20(or=20past)=20= the=20LSN=0A-=09=20*=20of=20the=20about-to-be-replayed=20conflict=20= record,=20we=20trust=20that=20the=20consumer=0A-=09=20*=20downstream=20= has=20captured=20everything=20that=20needed=20the=20pre-conflict=20= catalog=0A-=09=20*=20snapshot.=20Advance=20those=20slots'=20catalog_xmin=20= past=20the=20horizon=20so=20the=0A-=09=20*=20subsequent=20= InvalidateObsoleteReplicationSlots()=20fall-through=20is=20a=0A-=09=20*=20= no-op.=20Slots=20that=20the=20operator=20did=20NOT=20drain=20are=20left=20= alone=20and=20get=0A-=09=20*=20invalidated=20normally=20=E2=80=94=20that=20= is=20the=20"I=20didn't=20act,=20just=20let=20the=20slot=0A-=09=20*=20= die"=20path.=0A-=09=20*=0A-=09=20*=20"Drained=20past=20the=20conflict=20= LSN"=20is=20defined=20as:=20the=20slot's=0A-=09=20*=20= confirmed_flush_lsn=20>=3D=20the=20LSN=20at=20which=20replay=20has=20= paused,=20which=20is=0A-=09=20*=20the=20current=20replay=20position=20= reported=20by=20GetXLogReplayRecPtr.=0A+=09=20*=20Wait=20is=20over.=20= For=20any=20slot=20whose=20consumer=20drained=20up=20to=20(or=20past)=0A= +=09=20*=20conflict_lsn,=20advance=20catalog_xmin=20past=20the=20horizon=20= so=20the=20subsequent=0A+=09=20*=20InvalidateObsoleteReplicationSlots()=20= fall-through=20is=20a=20no-op.=20Slots=0A+=09=20*=20that=20did=20not=20= drain=20are=20left=20alone=20and=20get=20invalidated=20normally=20=E2=80=94= =20the=0A+=09=20*=20"I=20didn't=20act,=20just=20let=20the=20slot=20die"=20= path=20that=20runs=20when=20an=20operator=0A+=09=20*=20manually=20= resumed=20without=20draining.=0A=20=09=20*/=0A=20=09{=0A-=09=09= XLogRecPtr=09conflict_lsn=20=3D=20GetXLogReplayRecPtr(NULL);=0A=20=09=09= int=09=09=09j;=0A=20=0A=20=09=09= LWLockAcquire(ReplicationSlotControlLock,=20LW_EXCLUSIVE);=0A@@=20-723,7=20= +773,7=20@@=20MaybePauseOnLogicalSlotConflict(Oid=20dboid,=20= TransactionId=20snapshotConflictHorizon=0A=20=09=09=09=09ereport(LOG,=0A=20= =09=09=09=09=09=09(errmsg("advanced=20catalog_xmin=20of=20logical=20slot=20= \"%s\"=20past=20conflict=20horizon=20%u",=0A=20=09=09=09=09=09=09=09=09= NameStr(s->data.name),=20snapshotConflictHorizon),=0A-=09=09=09=09=09=09=20= errdetail("Slot's=20confirmed_flush_lsn=20%X/%X=20reached=20the=20= conflict=20record=20at=20%X/%X;=20operator=20drained=20before=20= resuming.",=0A+=09=09=09=09=09=09=20errdetail("Slot's=20= confirmed_flush_lsn=20%X/%X=20reached=20the=20conflict=20record=20at=20= %X/%X;=20consumer=20drained=20past=20the=20pause=20point.",=0A=20=09=09=09= =09=09=09=09=09=20=20=20LSN_FORMAT_ARGS(s->data.confirmed_flush),=0A=20=09= =09=09=09=09=09=09=09=20=20=20LSN_FORMAT_ARGS(conflict_lsn))));=0A=20=09=09= }=0Adiff=20--git=20a/src/backend/utils/misc/guc_parameters.dat=20= b/src/backend/utils/misc/guc_parameters.dat=0Aindex=20= 52b34443ec6..709079c399d=20100644=0A---=20= a/src/backend/utils/misc/guc_parameters.dat=0A+++=20= b/src/backend/utils/misc/guc_parameters.dat=0A@@=20-2444,7=20+2444,7=20= @@=0A=20{=20name=20=3D>=20'recovery_pause_on_logical_slot_conflict',=20= type=20=3D>=20'bool',=0A=20=20=20context=20=3D>=20'PGC_SIGHUP',=20group=20= =3D>=20'REPLICATION_STANDBY',=0A=20=20=20short_desc=20=3D>=20'Pauses=20= recovery=20instead=20of=20invalidating=20an=20active=20logical=20slot=20= on=20catalog=20conflict.',=0A-=20=20long_desc=20=3D>=20'When=20WAL=20= replay=20on=20a=20standby=20is=20about=20to=20invalidate=20a=20logical=20= replication=20slot=20because=20a=20catalog=20PRUNE_ON_ACCESS=20record=20= has=20overtaken=20the=20slot\'s=20catalog_xmin,=20pause=20recovery=20= instead.=20The=20operator=20can=20then=20drain=20or=20drop=20the=20slot=20= and=20call=20pg_wal_replay_resume()=20to=20continue.',=0A+=20=20= long_desc=20=3D>=20'When=20WAL=20replay=20on=20a=20standby=20is=20about=20= to=20invalidate=20a=20logical=20replication=20slot=20because=20a=20= catalog=20PRUNE_ON_ACCESS=20record=20has=20overtaken=20the=20slot\'s=20= catalog_xmin,=20pause=20recovery=20instead.=20Recovery=20resumes=20= automatically=20once=20the=20slot=20has=20been=20drained=20past=20the=20= pause=20point,=20dropped,=20advanced,=20or=20invalidated=20for=20another=20= reason=20(e.g.=20max_slot_wal_keep_size).=20pg_wal_replay_resume()=20= also=20forces=20continuation,=20invalidating=20any=20remaining=20= blocking=20slot.',=0A=20=20=20variable=20=3D>=20= 'recovery_pause_on_logical_slot_conflict',=0A=20=20=20boot_val=20=3D>=20= 'false',=0A=20},=0Adiff=20--git=20= a/src/backend/utils/misc/postgresql.conf.sample=20= b/src/backend/utils/misc/postgresql.conf.sample=0Aindex=20= 17b2bcc4df8..414fed447cf=20100644=0A---=20= a/src/backend/utils/misc/postgresql.conf.sample=0A+++=20= b/src/backend/utils/misc/postgresql.conf.sample=0A@@=20-404,7=20+404,9=20= @@=0A=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20#=20retrieve=20WAL=20= after=20a=20failed=20attempt=0A=20#recovery_min_apply_delay=20=3D=200=20=20= =20=20=20=20=20=20=20=20=20#=20minimum=20delay=20for=20applying=20= changes=20during=20recovery=0A=20= #recovery_pause_on_logical_slot_conflict=20=3D=20off=20=20#=20pause=20= recovery=20instead=20of=20invalidating=0A-=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20#=20a=20logical=20slot=20on=20catalog=20conflict=0A+=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20#=20a=20logical=20slot=20on=20= catalog=20conflict;=0A+=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20#=20= auto-resumes=20once=20the=20slot=20is=20drained,=0A+=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20#=20dropped,=20or=20otherwise=20unblocks=0A=20= #sync_replication_slots=20=3D=20off=20=20=20=20=20=20=20=20=20=20=20#=20= enables=20slot=20synchronization=20on=20the=20physical=20standby=20from=20= the=20primary=0A=20=0A=20#=20-=20Subscribers=20-=0Adiff=20--git=20= a/src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=20= b/src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=0Aindex=20= d1a03475e95..b53bf97694e=20100644=0A---=20= a/src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=0A+++=20= b/src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=0A@@=20= -217,6=20+217,49=20@@=20sub=20wait_for_replay_paused=0A=20=20=20=20=20= return=200;=0A=20}=0A=20=0A+#=20Models=20an=20operator=20who=20issued=20= an=20explicit=20pg_wal_replay_pause()=20that=0A+#=20must=20survive=20the=20= GUC's=20auto-resume.=20On=20entry=20replay=20is=20parked=20at=20a=0A+#=20= pre-conflict=20LSN=20with=20the=20operator=20pause=20already=20in=20= effect.=20Each=20tick=20we=0A+#=20nudge=20replay=20forward=20= (pg_wal_replay_resume())=20and=20then=20immediately=0A+#=20re-assert=20= the=20operator=20pause=20(pg_wal_replay_pause()),=20so=20that=20when=20= the=0A+#=20startup=20process=20reaches=20the=20catalog-prune=20record=20= the=20operator=20pause=20is=0A+#=20already=20pending=20=E2=80=94=20i.e.=20= GetRecoveryPauseState()=20!=3D=20RECOVERY_NOT_PAUSED=0A+#=20at=20the=20= moment=20MaybePauseOnLogicalSlotConflict()=20captures=20it.=20We=20then=0A= +#=20drain=20the=20slot=20so=20the=20GUC's=20auto-resume=20re-scan=20= finds=20nothing=20blocking.=0A+#=20With=20the=20fix=20the=20operator's=20= pause=20is=20preserved;=20without=20it=20the=0A+#=20unconditional=20= SetRecoveryPause(false)=20would=20clear=20it.=0A+#=0A+#=20Returns=20the=20= total=20number=20of=20changes=20drained.=0A+sub=20= drain_holding_user_pause=0A+{=0A+=20=20=20=20my=20($standby,=20= $slot_name,=20$deadline_seconds)=20=3D=20@_;=0A+=0A+=20=20=20=20my=20= $total_drained=20=3D=200;=0A+=20=20=20=20my=20$deadline=20=3D=20time()=20= +=20$deadline_seconds;=0A+=0A+=20=20=20=20while=20(time()=20<=20= $deadline)=20{=0A+=20=20=20=20=20=20=20=20#=20Drain=20whatever=20the=20= slot=20currently=20holds.=0A+=20=20=20=20=20=20=20=20my=20$got=20=3D=20= $standby->safe_psql('postgres',=0A+=20=20=20=20=20=20=20=20=20=20=20=20= "SELECT=20COUNT(*)=20FROM=20pg_logical_slot_get_changes('$slot_name',=20= NULL,=20NULL)");=0A+=20=20=20=20=20=20=20=20$total_drained=20+=3D=20= $got;=0A+=0A+=20=20=20=20=20=20=20=20#=20Stop=20once=20the=20slot=20is=20= fully=20drained=20and=20replay=20has=20advanced=20past=0A+=20=20=20=20=20= =20=20=20#=20the=20conflict=20(nothing=20left=20to=20decode=20and=20no=20= longer=20pause-looping=0A+=20=20=20=20=20=20=20=20#=20on=20the=20GUC).=20= A=20short=20tail=20of=20zero-change=20drains=20confirms=20we=20are=0A+=20= =20=20=20=20=20=20=20#=20done.=0A+=20=20=20=20=20=20=20=20last=20if=20= $got=20=3D=3D=200=20&&=20$total_drained=20>=200;=0A+=0A+=20=20=20=20=20=20= =20=20#=20Nudge=20replay=20forward,=20then=20immediately=20re-pause=20so=20= the=20operator=0A+=20=20=20=20=20=20=20=20#=20pause=20is=20pending=20= again=20when=20the=20next=20conflict=20record=20is=20applied.=0A+=20=20=20= =20=20=20=20=20$standby->safe_psql('postgres',=20"SELECT=20= pg_wal_replay_resume()");=0A+=20=20=20=20=20=20=20=20= $standby->safe_psql('postgres',=20"SELECT=20pg_wal_replay_pause()");=0A+=0A= +=20=20=20=20=20=20=20=20usleep(500_000);=0A+=20=20=20=20}=0A+=0A+=20=20=20= =20return=20$total_drained;=0A+}=0A+=0A=20#=20= ---------------------------------------------------------------------=0A=20= #=20Main=20script=0A=20#=20= ---------------------------------------------------------------------=0A= @@=20-228,16=20+271,19=20@@=20my=20$guc=20=3D=20= $node_primary->safe_psql('postgres',=0A=20=20=20=20=20"SELECT=20COUNT(*)=20= FROM=20pg_settings=20WHERE=20name=20=3D=20= 'recovery_pause_on_logical_slot_conflict'");=0A=20is($guc,=20'1',=20= 'recovery_pause_on_logical_slot_conflict=20GUC=20is=20registered');=0A=20= =0A-#=202.=20Phase=201:=20bring=20up=20BOTH=20standbys=20(GUC-on=20and=20= GUC-off)=20while=20the=0A-#=20archive=20still=20contains=20only=20the=20= quiet-moment=20snapshot=20=E2=80=94=20no=20prune=0A-#=20records=20yet.=20= Slot=20creation=20reaches=20SNAPBUILD_CONSISTENT=20quickly=20on=0A-#=20= both.=20Later,=20when=20Phase=202=20ships=20the=20prune=20records,=20the=20= two=20standbys=0A-#=20diverge:=20the=20GUC-on=20one=20pauses=20and=20= drains;=20the=20GUC-off=20one=0A-#=20invalidates.=0A+#=202.=20Phase=201:=20= bring=20up=20the=20standbys=20(GUC-on,=20GUC-off,=20and=20a=20second=0A= +#=20GUC-on=20"user-pause"=20standby)=20while=20the=20archive=20still=20= contains=20only=20the=0A+#=20quiet-moment=20snapshot=20=E2=80=94=20no=20= prune=20records=20yet.=20Slot=20creation=20reaches=0A+#=20= SNAPBUILD_CONSISTENT=20quickly=20on=20all=20of=20them.=20Later,=20when=20= Phase=202=20ships=0A+#=20the=20prune=20records,=20the=20standbys=20= diverge:=20the=20GUC-on=20ones=20pause=20and=0A+#=20drain;=20the=20= GUC-off=20one=20invalidates.=20The=20user-pause=20standby=20additionally=0A= +#=20checks=20that=20an=20operator's=20explicit=20pause=20survives=20the=20= GUC=20auto-resume.=0A=20my=20$node_standby=20=3D=20= create_archive_standby($node_primary,=20$backup_name,=0A=20=20=20=20=20= 'standby',=20'on');=0A=20my=20$node_standby_off=20=3D=20= create_archive_standby($node_primary,=20$backup_name,=0A=20=20=20=20=20= 'standby_off',=20'off');=0A+my=20$node_standby_up=20=3D=20= create_archive_standby($node_primary,=20$backup_name,=0A+=20=20=20=20= 'standby_userpause',=20'on');=0A=20=0A=20= $node_standby->safe_psql('postgres',=20qq[=0A=20=20=20=20=20SELECT=20= pg_create_logical_replication_slot('t_slot',=20'test_decoding');=0A@@=20= -245,6=20+291,9=20@@=20$node_standby->safe_psql('postgres',=20qq[=0A=20= $node_standby_off->safe_psql('postgres',=20qq[=0A=20=20=20=20=20SELECT=20= pg_create_logical_replication_slot('t_slot_off',=20'test_decoding');=0A=20= ]);=0A+$node_standby_up->safe_psql('postgres',=20qq[=0A+=20=20=20=20= SELECT=20pg_create_logical_replication_slot('up_slot',=20= 'test_decoding');=0A+]);=0A=20=0A=20my=20$slot_ready=20=3D=20= $node_standby->safe_psql('postgres',=20qq[=0A=20=20=20=20=20SELECT=20= wal_status=20FROM=20pg_replication_slots=20WHERE=20slot_name=20=3D=20= 't_slot'=0A@@=20-257,6=20+306,21=20@@=20my=20$off_slot_ready=20=3D=20= $node_standby_off->safe_psql('postgres',=20qq[=0A=20is($off_slot_ready,=20= 'reserved',=0A=20=20=20=20"baseline=20slot=20created=20cleanly=20in=20= Phase=201=20(state:=20$off_slot_ready)");=0A=20=0A+my=20$up_slot_ready=20= =3D=20$node_standby_up->safe_psql('postgres',=20qq[=0A+=20=20=20=20= SELECT=20wal_status=20FROM=20pg_replication_slots=20WHERE=20slot_name=20= =3D=20'up_slot'=0A+]);=0A+is($up_slot_ready,=20'reserved',=0A+=20=20=20= "user-pause=20slot=20created=20cleanly=20in=20Phase=201=20(state:=20= $up_slot_ready)");=0A+=0A+#=20Operator=20pauses=20recovery=20on=20the=20= user-pause=20standby=20NOW,=20while=20the=0A+#=20archive=20still=20only=20= holds=20the=20clean=20Phase-1=20snapshot=20and=20the=20catalog-=0A+#=20= prune=20conflict=20has=20not=20been=20replayed=20yet.=20This=20parks=20= replay=20at=20a=0A+#=20pre-conflict=20LSN=20with=20an=20explicit=20= operator=20pause=20in=20effect=20=E2=80=94=20the=20exact=0A+#=20= precondition=20for=20the=20user-pause-clobber=20bug.=0A= +$node_standby_up->safe_psql('postgres',=20"SELECT=20= pg_wal_replay_pause()");=0A+ok(wait_for_replay_paused($node_standby_up),=0A= +=20=20=20"user-pause=20standby=20parks=20on=20operator=20= pg_wal_replay_pause()=20before=20conflict");=0A+=0A=20#=203.=20Phase=20= 2:=20catalog=20churn=20on=20primary,=20then=20wait=20for=20archive.=0A=20= run_catalog_churn($node_primary);=0A=20=0A@@=20-322,6=20+386,50=20@@=20= cmp_ok($elapsed,=20'<',=2010,=0A=20=20=20=20=20"pg_promote=20completed=20= in=20under=2010s=20(actual:=20${elapsed}s)");=0A=20=0A=20= $node_standby_p->stop;=0A+=0A+#=207.=20User-pause=20survives=20= auto-resume.=20The=20operator=20paused=20recovery=20with=0A+#=20= pg_wal_replay_pause()=20before=20the=20conflict=20record=20was=20= replayed=20(done=20in=0A+#=20section=202).=20drain_holding_user_pause=20= nudges=20replay=20into=20the=20conflict=0A+#=20while=20keeping=20that=20= operator=20pause=20pending,=20then=20drains=20the=20slot=20so=20the=0A+#=20= GUC's=20auto-resume=20re-scan=20finds=20nothing=20blocking.=20The=20fix=20= in=0A+#=20MaybePauseOnLogicalSlotConflict()=20must=20then=20leave=20the=20= operator's=20pause=0A+#=20in=20place=20rather=20than=20clearing=20it=20= with=20an=20unconditional=0A+#=20SetRecoveryPause(false),=20so:=0A+#=20=20= =20-=20with=20the=20fix:=20replay=20stays=20'paused'=20after=20the=20= conflict=20resolves;=0A+#=20=20=20-=20without=20the=20fix:=20auto-resume=20= clears=20the=20pause=20and=20replay=20proceeds.=0A+my=20$up_drained=20=3D=20= drain_holding_user_pause($node_standby_up,=20'up_slot',=2060);=0A+=0A= +cmp_ok($up_drained,=20'>=3D',=202000,=0A+=20=20=20=20"user-pause=20= standby=20drained=20the=20slot=20under=20operator=20pause=20($up_drained=20= got)");=0A+=0A+#=20The=20slot=20must=20have=20survived=20(drained,=20not=20= invalidated)=20just=20like=20the=0A+#=20plain=20GUC-on=20standby.=0A+my=20= $up_slot_state=20=3D=20$node_standby_up->safe_psql('postgres',=20qq[=0A+=20= =20=20=20SELECT=20wal_status=20||=20'|'=20||=20= COALESCE(invalidation_reason,=20'')=0A+=20=20=20=20FROM=20= pg_replication_slots=20WHERE=20slot_name=20=3D=20'up_slot';=0A+]);=0A= +like($up_slot_state,=20qr/^reserved\|/,=0A+=20=20=20=20=20"user-pause=20= slot=20survived=20catalog=20prune=20(state:=20$up_slot_state)");=0A+=0A= +#=20The=20crux:=20recovery=20is=20STILL=20paused=20because=20the=20= operator's=20pause=20was=20not=0A+#=20cleared=20by=20the=20GUC's=20= auto-resume.=0A+my=20$up_pause_state=20=3D=20= $node_standby_up->safe_psql('postgres',=0A+=20=20=20=20"SELECT=20= pg_get_wal_replay_pause_state()");=0A+is($up_pause_state,=20'paused',=0A= +=20=20=20"operator=20pause=20survived=20GUC=20auto-resume=20(state:=20= $up_pause_state)");=0A+=0A+#=20Now=20the=20operator=20resumes=20and=20= replay=20must=20proceed=20past=20the=20pause.=0A+my=20$up_lsn_before=20=3D= =20$node_standby_up->safe_psql('postgres',=0A+=20=20=20=20"SELECT=20= pg_last_wal_replay_lsn()");=0A+$node_standby_up->safe_psql('postgres',=20= "SELECT=20pg_wal_replay_resume()");=0A= +$node_standby_up->poll_query_until('postgres',=0A+=20=20=20=20"SELECT=20= pg_get_wal_replay_pause_state()=20=3D=20'not=20paused'")=0A+=20=20=20=20= or=20die=20"replay=20did=20not=20leave=20paused=20state=20after=20= operator=20resume";=0A+ok($node_standby_up->poll_query_until('postgres',=0A= +=20=20=20=20"SELECT=20pg_last_wal_replay_lsn()=20>=3D=20= '$up_lsn_before'::pg_lsn"),=0A+=20=20=20"replay=20proceeds=20after=20= operator=20pg_wal_replay_resume()");=0A+=0A+$node_standby_up->stop;=0A=20= $node_standby_off->stop;=0A=20$node_standby->stop;=0A=20= $node_primary->stop;=0A--=20=0A2.50.1=20(Apple=20Git-155)=0A=0A= --Apple-Mail=_7CF3B2EF-A258-40FC-8984-55A2CDC2E2FE Content-Disposition: attachment; filename=v1-0002-Add-recovery_pause_on_logical_slot_conflict-GUC.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="v1-0002-Add-recovery_pause_on_logical_slot_conflict-GUC.patch" Content-Transfer-Encoding: quoted-printable =46rom=20d26b124be205e522a255dbacd330f961d957110c=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20Nik=20Samokhvalov=20= =0ADate:=20Wed,=2027=20May=202026=2010:58:07=20= -0700=0ASubject:=20[PATCH=20v1=202/3]=20Add=20= recovery_pause_on_logical_slot_conflict=20GUC=0AMIME-Version:=201.0=0A= Content-Type:=20text/plain;=20charset=3DUTF-8=0A= Content-Transfer-Encoding:=208bit=0A=0AAdd=20a=20new=20GUC,=20= recovery_pause_on_logical_slot_conflict=20(PGC_SIGHUP,=0Adefault=20off).=20= When=20enabled,=20WAL=20replay=20on=20a=20standby=20pauses=20instead=20= of=0Ainvalidating=20an=20active=20logical=20replication=20slot=20whose=20= catalog_xmin=0Awould=20be=20overtaken=20by=20a=20Heap2/PRUNE_ON_ACCESS=20= record's=0AsnapshotConflictHorizon.=20An=20operator=20can=20then=20drain=20= the=20slot=20via=0Apg_logical_slot_get_changes=20and=20call=20= pg_wal_replay_resume()=20to=0Acontinue.=20On=20resume,=20the=20slot's=20= catalog_xmin=20is=20advanced=20past=20the=0Aconflict=20horizon=20so=20= the=20subsequent=20InvalidateObsoleteReplicationSlots=0Acall=20becomes=20= a=20no-op;=20replay=20continues=20to=20the=20next=20conflict=20and=20the=0A= cycle=20repeats.=0A=0AThis=20makes=20logical=20decoding=20from=20an=20= archive-only=20standby=20(no=20streaming=0Areplication=20link=20to=20the=20= primary)=20viable=20for=20continuous=20CDC.=20Without=0Athis=20GUC,=20= slots=20on=20such=20standbys=20are=20invalidated=20the=20first=20time=20= replay=0Aapplies=20a=20catalog=20vacuum=20record=20whose=20horizon=20= exceeds=20the=20slot's=0Acatalog_xmin=20=E2=80=94=20typically=20~2=20*=20= autovacuum_naptime=20after=20slot=20creation.=0A=0AHooks=20into=20= ResolveRecoveryConflictWithSnapshot(),=20the=20single=20choke=0Apoint=20= in=20the=20replay=20path=20for=20RS_INVAL_HORIZON=20conflicts,=20via=20a=20= new=0AMaybePauseOnLogicalSlotConflict()=20function=20in=20standby.c.=20= Reuses=20the=0Aexisting=20SetRecoveryPause=20/=20recoveryNotPausedCV=20= machinery=20=E2=80=94=20no=20new=0Ashared-memory=20state.=20Hot=20path=20= when=20GUC=20off=20is=20one=20boolean=20early-return.=0A=0AEdge=20cases=20= handled:=0A-=20Slots=20still=20inside=20DecodingContextFindStartpoint=0A=20= =20(effective_catalog_xmin=20not=20yet=20valid)=20are=20skipped.=20= Pausing=20for=20them=0A=20=20would=20deadlock:=20snapbuild=20needs=20WAL=20= to=20advance,=20pause=20holds=20it=20back.=0A=20=20Invalidating=20an=20= in-progress=20slot=20is=20harmless=20=E2=80=94=20the=20caller=20retries.=0A= -=20Pause-check=20uses=20TransactionIdPrecedesOrEquals=20to=20match=20= the=0A=20=20semantics=20of=20DetermineSlotInvalidationCause.=20Without=20= that,=20a=20slot=0A=20=20whose=20catalog_xmin=20was=20just=20advanced=20= to=20horizon+1=20by=20a=20previous=0A=20=20pause=20cycle=20would=20fail=20= to=20re-pause=20on=20a=20subsequent=20record=20with=0A=20=20horizon=20=3D=3D= =20catalog_xmin,=20yet=20would=20still=20be=20invalidated.=0A-=20= CheckForStandbyTrigger()=20is=20called=20in=20the=20wait=20loop=20so=20= pg_promote()=0A=20=20does=20not=20stall=20while=20paused.=20Mirrors=20= the=20existing=20recoveryPausesHere=0A=20=20escape=20loop.=0A-=20Synced=20= slots=20(data.synced=20=3D=3D=20true)=20are=20skipped=20in=20both=20the=0A= =20=20pause-check=20and=20advance=20scans.=20Writing=20to=20their=20= fields=20from=20the=0A=20=20startup=20process=20would=20race=20with=20= the=20slot-sync=20worker.=0A=0ACrash=20safety:=20after=20advancing=20= catalog_xmin=20in=20memory,=20dirty=20slots=20are=0Aflushed=20to=20disk=20= immediately=20via=20CheckPointReplicationSlots(false)=20before=0A= returning.=20This=20upholds=20the=20write-before-memory-update=20= invariant=0Aestablished=20by=20LogicalConfirmReceivedLocation=20= (logical.c):=20the=20on-disk=0Astate=20must=20reflect=20any=20advance=20= before=20the=20in-memory=20value=20becomes=0Avisible,=20so=20that=20= vacuum=20cannot=20reclaim=20catalog=20tuples=20the=20slot=20still=0A= needs.=20Deferring=20to=20the=20next=20restartpoint=20would=20leave=20a=20= crash=20window.=0A=0AIncludes=20a=20TAP=20test=20= (050_recovery_pause_on_slot_conflict.pl)=20covering:=0A-=20GUC=20= registration=0A-=20Slot=20survival=20through=20catalog=20PRUNE_ON_ACCESS=20= records=20(GUC=20on)=0A-=20Baseline=20slot=20invalidation=20(GUC=20off,=20= unchanged=20upstream=20behaviour)=0A-=20pg_promote()=20succeeds=20in=20= under=2010=20s=20while=20the=20standby=20is=20paused=0A=20=20(guards=20= the=20CheckForStandbyTrigger()=20escape=20path)=0A=0ACo-Authored-By:=20= Claude=20Sonnet=204.6=20=0A---=0A=20= src/backend/access/transam/xlogrecovery.c=20=20=20=20=20|=20=2019=20+=0A=20= src/backend/storage/ipc/standby.c=20=20=20=20=20=20=20=20=20=20=20=20=20= |=20235=20+++++++++++++=0A=20src/backend/utils/misc/guc_parameters.dat=20= =20=20=20=20|=20=20=208=20+=0A=20= src/backend/utils/misc/postgresql.conf.sample=20|=20=20=202=20+=0A=20= src/include/access/xlogrecovery.h=20=20=20=20=20=20=20=20=20=20=20=20=20= |=20=20=201=20+=0A=20src/include/storage/standby.h=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20|=20=20=202=20+=0A=20= src/test/recovery/meson.build=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20|=20=20=201=20+=0A=20.../t/054_recovery_pause_on_slot_conflict.pl=20= =20|=20329=20++++++++++++++++++=0A=208=20files=20changed,=20597=20= insertions(+)=0A=20create=20mode=20100644=20= src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=0A=0Adiff=20= --git=20a/src/backend/access/transam/xlogrecovery.c=20= b/src/backend/access/transam/xlogrecovery.c=0Aindex=20= 80282e4689e..508e718169c=20100644=0A---=20= a/src/backend/access/transam/xlogrecovery.c=0A+++=20= b/src/backend/access/transam/xlogrecovery.c=0A@@=20-96,6=20+96,21=20@@=20= const=20char=20*recoveryTargetName;=0A=20XLogRecPtr=09recoveryTargetLSN;=0A= =20int=09=09=09recovery_min_apply_delay=20=3D=200;=0A=20=0A+/*=0A+=20*=20= If=20true,=20when=20WAL=20replay=20on=20a=20standby=20is=20about=20to=20= invalidate=20an=20otherwise-=0A+=20*=20active=20logical=20replication=20= slot=20because=20a=20catalog=20PRUNE_ON_ACCESS=20record's=0A+=20*=20= snapshotConflictHorizon=20has=20overtaken=20the=20slot's=20catalog_xmin,=20= pause=20replay=0A+=20*=20instead=20and=20give=20an=20operator=20a=20= chance=20to=20drain=20(or=20drop)=20the=20slot.=0A+=20*=0A+=20*=20= Motivated=20by=20blueprints/LOGICAL_DECODING_ARCHIVED_WALS.md=20=C2=A74.2.= 3=20/=20US-4:=0A+=20*=20an=20archive-only=20logical-decoding=20standby=20= cannot=20feed=20hot_standby_feedback=0A+=20*=20to=20the=20primary,=20so=20= it=20has=20no=20natural=20way=20to=20keep=20the=20primary's=20catalog=0A= +=20*=20horizon=20pinned.=20Without=20this=20GUC,=20any=20logical=20slot=20= created=20on=20such=20a=0A+=20*=20standby=20is=20invalidated=20the=20= first=20time=20replay=20applies=20a=20catalog=20vacuum=0A+=20*=20record=20= whose=20horizon=20exceeds=20the=20slot's=20catalog_xmin.=0A+=20*/=0A= +bool=09=09recovery_pause_on_logical_slot_conflict=20=3D=20false;=0A+=0A=20= /*=20options=20formerly=20taken=20from=20recovery.conf=20for=20XLOG=20= streaming=20*/=0A=20char=09=20=20=20*PrimaryConnInfo=20=3D=20NULL;=0A=20= char=09=20=20=20*PrimarySlotName=20=3D=20NULL;=0A@@=20-4440,6=20+4455,10=20= @@=20SetPromoteIsTriggered(void)=0A=20/*=0A=20=20*=20Check=20whether=20a=20= promote=20request=20has=20arrived.=0A=20=20*/=0A+/*=0A+=20*=20= Non-static:=20MaybePauseOnLogicalSlotConflict=20needs=20this=20to=20= break=20its=20wait=0A+=20*=20loop=20on=20promotion,=20same=20as=20= recoveryPausesHere=20does.=0A+=20*/=0A=20bool=0A=20= CheckForStandbyTrigger(void)=0A=20{=0Adiff=20--git=20= a/src/backend/storage/ipc/standby.c=20= b/src/backend/storage/ipc/standby.c=0Aindex=20de9092fdf5b..0659f9d2169=20= 100644=0A---=20a/src/backend/storage/ipc/standby.c=0A+++=20= b/src/backend/storage/ipc/standby.c=0A@@=20-24,8=20+24,11=20@@=0A=20= #include=20"access/xlogutils.h"=0A=20#include=20"miscadmin.h"=0A=20= #include=20"pgstat.h"=0A+#include=20"postmaster/startup.h"=0A=20#include=20= "replication/slot.h"=0A+#include=20"storage/condition_variable.h"=0A=20= #include=20"storage/bufmgr.h"=0A+#include=20"storage/lwlock.h"=0A=20= #include=20"storage/proc.h"=0A=20#include=20"storage/procarray.h"=0A=20= #include=20"storage/sinvaladt.h"=0A@@=20-503,8=20+506,240=20@@=20= ResolveRecoveryConflictWithSnapshot(TransactionId=20= snapshotConflictHorizon,=0A=20=09=20*=20reached,=20e.g.=20due=20to=20= using=20a=20physical=20replication=20slot.=0A=20=09=20*/=0A=20=09if=20= (IsLogicalDecodingEnabled()=20&&=20isCatalogRel)=0A+=09{=0A+=09=09= MaybePauseOnLogicalSlotConflict(locator.dbOid,=20= snapshotConflictHorizon);=0A=20=09=09= InvalidateObsoleteReplicationSlots(RS_INVAL_HORIZON,=200,=20= locator.dbOid,=0A=20=09=09=09=09=09=09=09=09=09=09=20=20=20= snapshotConflictHorizon);=0A+=09}=0A+}=0A+=0A+/*=0A+=20*=20If=20= recovery_pause_on_logical_slot_conflict=20is=20enabled,=20and=20replay=20= is=20about=0A+=20*=20to=20apply=20a=20catalog=20PRUNE_ON_ACCESS=20record=20= whose=20snapshotConflictHorizon=0A+=20*=20would=20cause=20the=20= invalidation=20of=20at=20least=20one=20non-invalidated=20logical=20slot=0A= +=20*=20in=20the=20same=20database,=20request=20a=20recovery=20pause=20= and=20wait=20on=20the=20recovery=0A+=20*=20pause=20condition=20variable=20= until=20an=20operator=20resumes.=0A+=20*=0A+=20*=20On=20resume=20the=20= caller=20re-falls=20through=20to=20InvalidateObsoleteReplicationSlots:=0A= +=20*=20if=20the=20operator=20has=20drained=20/=20dropped=20/=20advanced=20= the=20slot,=20invalidation=20is=0A+=20*=20a=20no-op;=20if=20they=20chose=20= to=20resume=20without=20acting,=20the=20slot=20is=20invalidated=0A+=20*=20= as=20usual.=20This=20matches=20the=20recovery_target_action=3Dpause=20= precedent.=0A+=20*=0A+=20*=20The=20two=20parameters=20identify=20which=20= slots,=20if=20any,=20this=20prune=20record=20can=0A+=20*=20conflict=20= with:=0A+=20*=20=20=20-=20dboid:=20logical=20slots=20are=20per-database,=20= so=20only=20slots=20belonging=20to=20this=0A+=20*=20=20=20=20=20database=20= can=20be=20invalidated=20by=20a=20catalog=20prune=20happening=20here;=20= slots=20in=0A+=20*=20=20=20=20=20other=20databases=20are=20never=20= affected=20and=20must=20be=20ignored.=0A+=20*=20=20=20-=20= snapshotConflictHorizon:=20the=20xid=20threshold=20carried=20by=20the=0A= +=20*=20=20=20=20=20PRUNE_ON_ACCESS=20record.=20A=20slot=20conflicts=20= iff=20its=20catalog_xmin=0A+=20*=20=20=20=20=20precedes-or-equals=20this=20= horizon=20(i.e.=20it=20still=20needs=20catalog=20rows=20the=0A+=20*=20=20= =20=20=20prune=20is=20about=20to=20remove).=0A+=20*=0A+=20*=20Only=20= invoked=20from=20ResolveRecoveryConflictWithSnapshot(),=20before=20any=20= buffer=0A+=20*=20locks=20are=20taken,=20so=20pausing=20here=20does=20not=20= deadlock=20with=20anything.=0A+=20*/=0A+void=0A= +MaybePauseOnLogicalSlotConflict(Oid=20dboid,=20TransactionId=20= snapshotConflictHorizon)=0A+{=0A+=09int=09=09=09i;=0A+=09bool=09=09= would_invalidate=20=3D=20false;=0A+=0A+=09if=20= (!recovery_pause_on_logical_slot_conflict)=0A+=09=09return;=0A+=09if=20= (!TransactionIdIsValid(snapshotConflictHorizon))=0A+=09=09return;=0A+=0A= +=09/*=0A+=09=20*=20Scan=20for=20a=20would-be-invalidated=20slot=20in=20= the=20conflicting=20database.=0A+=09=20*=0A+=09=20*=20Skip=20slots=20= that=20have=20not=20yet=20reached=20snapshot-builder=20consistency=0A+=09= =20*=20(effective_catalog_xmin=20is=20still=20InvalidTransactionId).=20= An=20in-progress=0A+=09=20*=20slot=20has=20not=20produced=20any=20output=20= to=20a=20consumer,=20so=20invalidating=20it=20is=0A+=09=20*=20harmless=20= =E2=80=94=20the=20caller=20can=20retry.=20Pausing=20for=20such=20a=20= slot=20would=0A+=09=20*=20deadlock:=20DecodingContextFindStartpoint=20= would=20be=20waiting=20for=20replay=0A+=09=20*=20to=20advance,=20while=20= replay=20would=20be=20waiting=20for=20the=20slot=20to=20be=20drained.=0A= +=09=20*/=0A+=09LWLockAcquire(ReplicationSlotControlLock,=20LW_SHARED);=0A= +=09for=20(i=20=3D=200;=20i=20<=20max_replication_slots;=20i++)=0A+=09{=0A= +=09=09ReplicationSlot=20*s=20=3D=20= &ReplicationSlotCtl->replication_slots[i];=0A+=09=09Oid=09=09=09slot_db;=0A= +=09=09TransactionId=20slot_xmin;=0A+=09=09TransactionId=20= slot_effective_xmin;=0A+=09=09bool=09=09active_logical;=0A+=0A+=09=09if=20= (!s->in_use)=0A+=09=09=09continue;=0A+=0A+=09=09= SpinLockAcquire(&s->mutex);=0A+=09=09slot_db=20=3D=20s->data.database;=0A= +=09=09slot_xmin=20=3D=20s->data.catalog_xmin;=0A+=09=09= slot_effective_xmin=20=3D=20s->effective_catalog_xmin;=0A+=09=09/*=0A+=09= =09=20*=20Skip=20synced=20slots=20(managed=20by=20the=20slot-sync=20= worker=20per=0A+=09=09=20*=20sync_replication_slots).=20Writing=20their=20= fields=20from=20the=20startup=0A+=09=09=20*=20process=20would=20race=20= with=20the=20slot-sync=20worker's=20own=20updates,=20and=0A+=09=09=20*=20= the=20operator-facing=20"drain=20or=20drop=20the=20slot"=20recipe=20in=20= the=0A+=09=09=20*=20errhint=20below=20cannot=20be=20applied=20to=20a=20= synced=20slot=20(ALTER=20/=0A+=09=09=20*=20DROP_REPLICATION_SLOT=20error=20= on=20synced).=0A+=09=09=20*/=0A+=09=09active_logical=20=3D=20= (s->data.invalidated=20=3D=3D=20RS_INVAL_NONE=20&&=0A+=09=09=09=09=09=09=20= =20slot_db=20!=3D=20InvalidOid=20&&=0A+=09=09=09=09=09=09=20=20= TransactionIdIsValid(slot_effective_xmin)=20&&=0A+=09=09=09=09=09=09=20=20= !s->data.synced);=0A+=09=09SpinLockRelease(&s->mutex);=0A+=0A+=09=09if=20= (!active_logical)=0A+=09=09=09continue;=0A+=09=09if=20(slot_db=20!=3D=20= dboid)=0A+=09=09=09continue;=0A+=09=09if=20= (!TransactionIdIsValid(slot_xmin))=0A+=09=09=09continue;=0A+=09=09/*=0A+=09= =09=20*=20Use=20PrecedesOrEquals=20(not=20Precedes)=20to=20match=20the=20= check=20in=0A+=09=09=20*=20DetermineSlotInvalidationCause.=20Otherwise=20= a=20slot=20whose=0A+=09=09=20*=20catalog_xmin=20was=20just=20advanced=20= to=20exactly=20conflict_horizon=20by=0A+=09=09=20*=20a=20previous=20= pause-and-advance=20cycle=20(our=20own=20resume=20code)=20will=0A+=09=09=20= *=20NOT=20trigger=20a=20pause=20here=20when=20the=20next=20prune=20= record=20arrives=0A+=09=09=20*=20with=20horizon=20=3D=3D=20catalog_xmin,=20= yet=20WILL=20still=20be=20invalidated=0A+=09=09=20*=20by=20the=20= fall-through=20InvalidateObsoleteReplicationSlots=20call.=0A+=09=09=20*/=0A= +=09=09if=20(TransactionIdPrecedesOrEquals(slot_xmin,=20= snapshotConflictHorizon))=0A+=09=09{=0A+=09=09=09would_invalidate=20=3D=20= true;=0A+=09=09=09break;=0A+=09=09}=0A+=09}=0A+=09= LWLockRelease(ReplicationSlotControlLock);=0A+=0A+=09if=20= (!would_invalidate)=0A+=09=09return;=0A+=0A+=09ereport(LOG,=0A+=09=09=09= (errmsg("recovery=20paused:=20WAL=20redo=20at=20%X/%X=20would=20= invalidate=20a=20logical=20replication=20slot",=0A+=09=09=09=09=09= LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))),=0A+=09=09=09=20= errdetail("snapshotConflictHorizon=20%u=20exceeds=20catalog_xmin=20of=20= at=20least=20one=20active=20logical=20slot=20in=20database=20%u.",=0A+=09= =09=09=09=09=20=20=20snapshotConflictHorizon,=20dboid),=0A+=09=09=09=20= errhint("Drain,=20advance,=20or=20drop=20the=20slot,=20then=20execute=20= pg_wal_replay_resume().")));=0A+=0A+=09SetRecoveryPause(true);=0A+=0A+=09= while=20(GetRecoveryPauseState()=20!=3D=20RECOVERY_NOT_PAUSED)=0A+=09{=0A= +=09=09ProcessStartupProcInterrupts();=0A+=0A+=09=09/*=0A+=09=09=20*=20= If=20the=20operator=20gave=20up=20on=20the=20slot=20and=20triggered=20a=20= promotion=0A+=09=09=20*=20instead,=20bail=20out=20of=20the=20wait=20so=20= the=20startup=20process=20can=20proceed=0A+=09=09=20*=20with=20the=20= promotion=20path.=20Must=20use=20CheckForStandbyTrigger=20(which=0A+=09=09= =20*=20actually=20consumes=20PROMOTE_SIGNAL_FILE),=20not=20= PromoteIsTriggered=0A+=09=09=20*=20(which=20only=20reads=20a=20flag=20= populated=20by=20the=20former).=20Mirrors=20the=0A+=09=09=20*=20same=20= escape=20in=20recoveryPausesHere().=0A+=09=09=20*/=0A+=09=09if=20= (CheckForStandbyTrigger())=0A+=09=09{=0A+=09=09=09= ConditionVariableCancelSleep();=0A+=09=09=09return;=0A+=09=09}=0A+=0A+=09= =09/*=0A+=09=09=20*=20Promote=20RECOVERY_PAUSE_REQUESTED=20to=20= RECOVERY_PAUSED=20so=20that=0A+=09=09=20*=20observers=20= (pg_get_wal_replay_pause_state()=20/=20monitoring)=20see=20the=0A+=09=09=20= *=20pause=20as=20actually=20taken,=20not=20just=20requested.=0A+=09=09=20= */=0A+=09=09ConfirmRecoveryPaused();=0A+=09=09= ConditionVariableTimedSleep(&XLogRecoveryCtl->recoveryNotPausedCV,=0A+=09= =09=09=09=09=09=09=09=091000,=20WAIT_EVENT_RECOVERY_PAUSE);=0A+=09}=0A+=09= ConditionVariableCancelSleep();=0A+=0A+=09/*=0A+=09=20*=20Operator=20has=20= resumed.=20If=20they=20drained=20slot(s)=20up=20to=20(or=20past)=20the=20= LSN=0A+=09=20*=20of=20the=20about-to-be-replayed=20conflict=20record,=20= we=20trust=20that=20the=20consumer=0A+=09=20*=20downstream=20has=20= captured=20everything=20that=20needed=20the=20pre-conflict=20catalog=0A+=09= =20*=20snapshot.=20Advance=20those=20slots'=20catalog_xmin=20past=20the=20= horizon=20so=20the=0A+=09=20*=20subsequent=20= InvalidateObsoleteReplicationSlots()=20fall-through=20is=20a=0A+=09=20*=20= no-op.=20Slots=20that=20the=20operator=20did=20NOT=20drain=20are=20left=20= alone=20and=20get=0A+=09=20*=20invalidated=20normally=20=E2=80=94=20that=20= is=20the=20"I=20didn't=20act,=20just=20let=20the=20slot=0A+=09=20*=20= die"=20path.=0A+=09=20*=0A+=09=20*=20"Drained=20past=20the=20conflict=20= LSN"=20is=20defined=20as:=20the=20slot's=0A+=09=20*=20= confirmed_flush_lsn=20>=3D=20the=20LSN=20at=20which=20replay=20has=20= paused,=20which=20is=0A+=09=20*=20the=20current=20replay=20position=20= reported=20by=20GetXLogReplayRecPtr.=0A+=09=20*/=0A+=09{=0A+=09=09= XLogRecPtr=09conflict_lsn=20=3D=20GetXLogReplayRecPtr(NULL);=0A+=09=09= int=09=09=09j;=0A+=0A+=09=09LWLockAcquire(ReplicationSlotControlLock,=20= LW_EXCLUSIVE);=0A+=09=09for=20(j=20=3D=200;=20j=20<=20= max_replication_slots;=20j++)=0A+=09=09{=0A+=09=09=09ReplicationSlot=20= *s=20=3D=20&ReplicationSlotCtl->replication_slots[j];=0A+=09=09=09bool=09= =09advance;=0A+=0A+=09=09=09if=20(!s->in_use)=0A+=09=09=09=09continue;=0A= +=0A+=09=09=09SpinLockAcquire(&s->mutex);=0A+=09=09=09/*=0A+=09=09=09=20= *=20Skip=20synced=20slots=20=E2=80=94=20same=20reason=20as=20in=20the=20= pause-check=20scan.=0A+=09=09=09=20*=20Writing=20their=20fields=20would=20= race=20the=20slot-sync=20worker.=0A+=09=09=09=20*/=0A+=09=09=09advance=20= =3D=20(s->data.invalidated=20=3D=3D=20RS_INVAL_NONE=20&&=0A+=09=09=09=09=09= =20=20=20s->data.database=20=3D=3D=20dboid=20&&=0A+=09=09=09=09=09=20=20=20= !s->data.synced=20&&=0A+=09=09=09=09=09=20=20=20s->data.confirmed_flush=20= >=3D=20conflict_lsn=20&&=0A+=09=09=09=09=09=20=20=20= ((TransactionIdIsValid(s->data.catalog_xmin)=20&&=0A+=09=09=09=09=09=09=20= TransactionIdPrecedesOrEquals(s->data.catalog_xmin,=0A+=09=09=09=09=09=09= =09=09=09=09=09=09=09=20=20=20snapshotConflictHorizon))=20||=0A+=09=09=09= =09=09=09(TransactionIdIsValid(s->data.xmin)=20&&=0A+=09=09=09=09=09=09=20= TransactionIdPrecedesOrEquals(s->data.xmin,=0A+=09=09=09=09=09=09=09=09=09= =09=09=09=09=20=20=20snapshotConflictHorizon))));=0A+=09=09=09if=20= (advance)=0A+=09=09=09{=0A+=09=09=09=09TransactionId=20new_xmin=20=3D=20= snapshotConflictHorizon;=0A+=0A+=09=09=09=09= TransactionIdAdvance(new_xmin);=09/*=20strictly=20>=20horizon=20*/=0A+=09= =09=09=09if=20(TransactionIdIsValid(s->data.catalog_xmin)=20&&=0A+=09=09=09= =09=09TransactionIdPrecedesOrEquals(s->data.catalog_xmin,=0A+=09=09=09=09= =09=09=09=09=09=09=09=09=20=20snapshotConflictHorizon))=0A+=09=09=09=09{=0A= +=09=09=09=09=09s->data.catalog_xmin=20=3D=20new_xmin;=0A+=09=09=09=09=09= s->effective_catalog_xmin=20=3D=20new_xmin;=0A+=09=09=09=09}=0A+=09=09=09= =09if=20(TransactionIdIsValid(s->data.xmin)=20&&=0A+=09=09=09=09=09= TransactionIdPrecedesOrEquals(s->data.xmin,=0A+=09=09=09=09=09=09=09=09=09= =09=09=09=20=20snapshotConflictHorizon))=0A+=09=09=09=09{=0A+=09=09=09=09= =09s->data.xmin=20=3D=20new_xmin;=0A+=09=09=09=09=09s->effective_xmin=20= =3D=20new_xmin;=0A+=09=09=09=09}=0A+=09=09=09=09s->just_dirtied=20=3D=20= true;=0A+=09=09=09=09s->dirty=20=3D=20true;=0A+=09=09=09}=0A+=09=09=09= SpinLockRelease(&s->mutex);=0A+=0A+=09=09=09if=20(advance)=0A+=09=09=09=09= ereport(LOG,=0A+=09=09=09=09=09=09(errmsg("advanced=20catalog_xmin=20of=20= logical=20slot=20\"%s\"=20past=20conflict=20horizon=20%u",=0A+=09=09=09=09= =09=09=09=09NameStr(s->data.name),=20snapshotConflictHorizon),=0A+=09=09=09= =09=09=09=20errdetail("Slot's=20confirmed_flush_lsn=20%X/%X=20reached=20= the=20conflict=20record=20at=20%X/%X;=20operator=20drained=20before=20= resuming.",=0A+=09=09=09=09=09=09=09=09=20=20=20= LSN_FORMAT_ARGS(s->data.confirmed_flush),=0A+=09=09=09=09=09=09=09=09=20=20= =20LSN_FORMAT_ARGS(conflict_lsn))));=0A+=09=09}=0A+=09=09= LWLockRelease(ReplicationSlotControlLock);=0A+=0A+=09=09/*=0A+=09=09=20*=20= Flush=20dirty=20slots=20to=20disk=20immediately=20=E2=80=94=20do=20not=20= defer=20to=20the=20next=0A+=09=09=20*=20restartpoint.=20=20The=20normal=20= streaming=20path=20(LogicalConfirmReceivedLocation=0A+=09=09=20*=20in=20= logical.c)=20saves=20slot=20state=20to=20disk=20*before*=20updating=20= the=0A+=09=09=20*=20in-memory=20effective_catalog_xmin.=20=20We=20must=20= uphold=20the=20same=20invariant:=0A+=09=09=20*=20if=20we=20crash=20= between=20the=20in-memory=20advance=20above=20and=20the=20next=0A+=09=09=20= *=20restartpoint,=20the=20on-disk=20catalog_xmin=20must=20reflect=20the=20= advance=20so=0A+=09=09=20*=20that=20vacuum=20cannot=20reclaim=20catalog=20= tuples=20the=20slot=20still=20needs.=0A+=09=09=20*/=0A+=09=09= CheckPointReplicationSlots(false);=0A+=09}=0A=20}=0A=20=0A=20/*=0Adiff=20= --git=20a/src/backend/utils/misc/guc_parameters.dat=20= b/src/backend/utils/misc/guc_parameters.dat=0Aindex=20= afaa058b046..52b34443ec6=20100644=0A---=20= a/src/backend/utils/misc/guc_parameters.dat=0A+++=20= b/src/backend/utils/misc/guc_parameters.dat=0A@@=20-2441,6=20+2441,14=20= @@=0A=20=20=20max=20=3D>=20'INT_MAX',=0A=20},=0A=20=0A+{=20name=20=3D>=20= 'recovery_pause_on_logical_slot_conflict',=20type=20=3D>=20'bool',=0A+=20= =20context=20=3D>=20'PGC_SIGHUP',=20group=20=3D>=20= 'REPLICATION_STANDBY',=0A+=20=20short_desc=20=3D>=20'Pauses=20recovery=20= instead=20of=20invalidating=20an=20active=20logical=20slot=20on=20= catalog=20conflict.',=0A+=20=20long_desc=20=3D>=20'When=20WAL=20replay=20= on=20a=20standby=20is=20about=20to=20invalidate=20a=20logical=20= replication=20slot=20because=20a=20catalog=20PRUNE_ON_ACCESS=20record=20= has=20overtaken=20the=20slot\'s=20catalog_xmin,=20pause=20recovery=20= instead.=20The=20operator=20can=20then=20drain=20or=20drop=20the=20slot=20= and=20call=20pg_wal_replay_resume()=20to=20continue.',=0A+=20=20variable=20= =3D>=20'recovery_pause_on_logical_slot_conflict',=0A+=20=20boot_val=20=3D>= =20'false',=0A+},=0A+=0A=20{=20name=20=3D>=20'recovery_prefetch',=20type=20= =3D>=20'enum',=20context=20=3D>=20'PGC_SIGHUP',=20group=20=3D>=20= 'WAL_RECOVERY',=0A=20=20=20short_desc=20=3D>=20'Prefetch=20referenced=20= blocks=20during=20recovery.',=0A=20=20=20long_desc=20=3D>=20'Look=20= ahead=20in=20the=20WAL=20to=20find=20references=20to=20uncached=20= data.',=0Adiff=20--git=20a/src/backend/utils/misc/postgresql.conf.sample=20= b/src/backend/utils/misc/postgresql.conf.sample=0Aindex=20= ac38cddaaf9..17b2bcc4df8=20100644=0A---=20= a/src/backend/utils/misc/postgresql.conf.sample=0A+++=20= b/src/backend/utils/misc/postgresql.conf.sample=0A@@=20-403,6=20+403,8=20= @@=0A=20#wal_retrieve_retry_interval=20=3D=205s=20=20=20=20=20=20=20#=20= time=20to=20wait=20before=20retrying=20to=0A=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20#=20retrieve=20WAL=20after=20a=20failed=20attempt=0A=20= #recovery_min_apply_delay=20=3D=200=20=20=20=20=20=20=20=20=20=20=20#=20= minimum=20delay=20for=20applying=20changes=20during=20recovery=0A= +#recovery_pause_on_logical_slot_conflict=20=3D=20off=20=20#=20pause=20= recovery=20instead=20of=20invalidating=0A+=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20#=20a=20logical=20slot=20on=20catalog=20conflict=0A=20= #sync_replication_slots=20=3D=20off=20=20=20=20=20=20=20=20=20=20=20#=20= enables=20slot=20synchronization=20on=20the=20physical=20standby=20from=20= the=20primary=0A=20=0A=20#=20-=20Subscribers=20-=0Adiff=20--git=20= a/src/include/access/xlogrecovery.h=20= b/src/include/access/xlogrecovery.h=0Aindex=201683ec14a5a..58d578ce853=20= 100644=0A---=20a/src/include/access/xlogrecovery.h=0A+++=20= b/src/include/access/xlogrecovery.h=0A@@=20-129,6=20+129,7=20@@=20extern=20= PGDLLIMPORT=20XLogRecoveryCtlData=20*XLogRecoveryCtl;=0A=20extern=20= PGDLLIMPORT=20bool=20recoveryTargetInclusive;=0A=20extern=20PGDLLIMPORT=20= int=20recoveryTargetAction;=0A=20extern=20PGDLLIMPORT=20int=20= recovery_min_apply_delay;=0A+extern=20PGDLLIMPORT=20bool=20= recovery_pause_on_logical_slot_conflict;=0A=20extern=20PGDLLIMPORT=20= char=20*PrimaryConnInfo;=0A=20extern=20PGDLLIMPORT=20char=20= *PrimarySlotName;=0A=20extern=20PGDLLIMPORT=20char=20= *recoveryRestoreCommand;=0Adiff=20--git=20= a/src/include/storage/standby.h=20b/src/include/storage/standby.h=0A= index=206a314c693cd..03705a8eb8c=20100644=0A---=20= a/src/include/storage/standby.h=0A+++=20b/src/include/storage/standby.h=0A= @@=20-75,6=20+75,8=20@@=20extern=20void=20= ResolveRecoveryConflictWithSnapshot(TransactionId=20snapshotConflictHo=0A= =20extern=20void=20= ResolveRecoveryConflictWithSnapshotFullXid(FullTransactionId=20= snapshotConflictHorizon,=0A=20=09=09=09=09=09=09=09=09=09=09=09=09=09=20=20= =20bool=20isCatalogRel,=0A=20=09=09=09=09=09=09=09=09=09=09=09=09=09=20=20= =20RelFileLocator=20locator);=0A+extern=20void=20= MaybePauseOnLogicalSlotConflict(Oid=20dboid,=0A+=09=09=09=09=09=09=09=09=09= =09=09TransactionId=20snapshotConflictHorizon);=0A=20extern=20void=20= ResolveRecoveryConflictWithTablespace(Oid=20tsid);=0A=20extern=20void=20= ResolveRecoveryConflictWithDatabase(Oid=20dbid);=0A=20=0Adiff=20--git=20= a/src/test/recovery/meson.build=20b/src/test/recovery/meson.build=0A= index=209eb8ed11425..ae794cb5bae=20100644=0A---=20= a/src/test/recovery/meson.build=0A+++=20b/src/test/recovery/meson.build=0A= @@=20-62,6=20+62,7=20@@=20tests=20+=3D=20{=0A=20=20=20=20=20=20=20= 't/051_effective_wal_level.pl',=0A=20=20=20=20=20=20=20= 't/052_checkpoint_segment_missing.pl',=0A=20=20=20=20=20=20=20= 't/053_standby_login_event_trigger.pl',=0A+=20=20=20=20=20=20= 't/054_recovery_pause_on_slot_conflict.pl',=0A=20=20=20=20=20],=0A=20=20=20= },=0A=20}=0Adiff=20--git=20= a/src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=20= b/src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=0Anew=20= file=20mode=20100644=0Aindex=2000000000000..d1a03475e95=0A---=20= /dev/null=0A+++=20= b/src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl=0A@@=20-0,0=20= +1,329=20@@=0A+#=20Copyright=20(c)=202026,=20PostgreSQL=20Global=20= Development=20Group=0A+=0A+#=20Exercise=20the=20= recovery_pause_on_logical_slot_conflict=20GUC=20on=20a=20standby.=0A+#=0A= +#=20Two-phase=20flow=20so=20the=20slot=20is=20fully=20consistent=20= BEFORE=20any=20catalog-=0A+#=20prune=20WAL=20record=20is=20replayed=20= =E2=80=94=20otherwise=20slot=20creation=20would=20block=0A+#=20inside=20= DecodingContextFindStartpoint=20while=20replay=20pauses=20on=20the=0A+#=20= prune,=20and=20we=20would=20deadlock.=20(Fix=20#1,=20bbd5d4e13bc,=20= narrows=20the=0A+#=20window=20but=20doesn't=20remove=20it;=20keeping=20= the=20two-phase=20flow=20explicit=0A+#=20makes=20the=20test=20robust.)=0A= +#=0A+#=20Phase=201=20=E2=80=94=20bring=20up=20a=20consistent=20logical=20= slot=20on=20the=20standby=20from=20a=0A+#=20quiet=20primary=20archive:=0A= +#=20=20=20*=20take=20basebackup=0A+#=20=20=20*=20= pg_log_standby_snapshot()=20=E2=86=92=20snapbuild=20path=20(a)=20anchor=0A= +#=20=20=20*=20wait=20for=20the=20snapshot's=20segment=20to=20archive=0A= +#=20=20=20*=20start=20standby,=20let=20replay=20catch=20up,=20create=20= slot=20(quick=20=E2=80=94=20no=0A+#=20=20=20=20=20prune=20records=20in=20= the=20archive=20yet).=0A+#=0A+#=20Phase=202=20=E2=80=94=20churn=20the=20= primary's=20catalog=20so=20the=20standby's=20replay=0A+#=20eventually=20= hits=20a=20catalog-prune=20record=20that=20would=20invalidate=20the=0A+#=20= slot:=0A+#=20=20=20*=20run=20CREATE=20/=20DROP=20of=20transient=20tables=20= (pg_class=20churn)=0A+#=20=20=20*=20run=20ANALYZE=20x2=20+=20VACUUM=20= pg_statistic=20/=20pg_class=20(HOT=20prune=20on=0A+#=20=20=20=20=20= catalog=20relations=20in=20db=3Dpostgres)=0A+#=20=20=20*=20wait=20for=20= those=20segments=20to=20archive=0A+#=20=20=20*=20orchestrator=20loop=20= on=20the=20standby:=20when=0A+#=20=20=20=20=20= pg_get_wal_replay_pause_state()=20returns=20paused,=20drain=20the=20slot=0A= +#=20=20=20=20=20via=20pg_logical_slot_get_changes,=20call=20= pg_wal_replay_resume,=0A+#=20=20=20=20=20continue.=0A+=0A+use=20strict;=0A= +use=20warnings=20FATAL=20=3D>=20'all';=0A+=0A+use=20= PostgreSQL::Test::Cluster;=0A+use=20PostgreSQL::Test::Utils;=0A+use=20= Test::More;=0A+use=20Time::HiRes=20qw(usleep);=0A+=0A+#=20= ---------------------------------------------------------------------=0A= +#=20Helpers=0A+#=20= ---------------------------------------------------------------------=0A= +=0A+#=20Build=20the=20primary,=20seed=20the=20workload=20table,=20take=20= a=20basebackup,=20and=0A+#=20produce=20a=20"clean"=20archive:=20one=20= that=20contains=20a=20standby=20snapshot=20but=0A+#=20no=20catalog-prune=20= WAL=20yet.=20Returns=20($node_primary,=20$backup_name).=0A+sub=20= setup_primary_with_clean_archive=0A+{=0A+=20=20=20=20my=20$node_primary=20= =3D=20PostgreSQL::Test::Cluster->new('primary');=0A+=20=20=20=20= $node_primary->init(allows_streaming=20=3D>=20'logical',=20has_archiving=20= =3D>=201);=0A+=20=20=20=20$node_primary->append_conf('postgresql.conf',=20= qq[=0A+wal_level=20=3D=20logical=0A+archive_mode=20=3D=20on=0A= +archive_timeout=20=3D=201s=0A+autovacuum=20=3D=20on=0A= +autovacuum_naptime=20=3D=205s=0A+fsync=20=3D=20off=0A= +synchronous_commit=20=3D=20off=0A+]);=0A+=20=20=20=20= $node_primary->start;=0A+=0A+=20=20=20=20= $node_primary->safe_psql('postgres',=20qq[=0A+=20=20=20=20=20=20=20=20= CREATE=20TABLE=20events=20(id=20serial=20PRIMARY=20KEY,=20payload=20= text);=0A+=20=20=20=20=20=20=20=20ALTER=20TABLE=20events=20REPLICA=20= IDENTITY=20FULL;=0A+=20=20=20=20=20=20=20=20INSERT=20INTO=20events=20= (payload)=20VALUES=20('seed');=0A+=20=20=20=20]);=0A+=0A+=20=20=20=20my=20= $backup_name=20=3D=20'backup1';=0A+=20=20=20=20= $node_primary->backup($backup_name);=0A+=0A+=20=20=20=20#=20Quiet-moment=20= RUNNING_XACTS=20in=20post-backup=20WAL=20=E2=80=94=20provides=20path=20= (a)=0A+=20=20=20=20#=20anchor=20for=20snapbuild.=0A+=20=20=20=20= $node_primary->safe_psql('postgres',=20"SELECT=20= pg_log_standby_snapshot();");=0A+=0A+=20=20=20=20#=20Force=20the=20= segment=20containing=20that=20anchor=20to=20archive=20so=20the=20standby=0A= +=20=20=20=20#=20can=20see=20it=20via=20restore_command.=20Switch=20= TWICE:=20first=20switch=20closes=0A+=20=20=20=20#=20the=20segment=20with=20= the=20snapshot=20record;=20second=20switch=20gives=0A+=20=20=20=20#=20= snapbuild=20the=20forward=20WAL=20it=20needs=20to=20decide=20the=20slot=20= is=0A+=20=20=20=20#=20consistent.=20Without=20the=20second=20switch,=0A+=20= =20=20=20#=20DecodingContextFindStartpoint=20blocks=20on=20'waiting=20= for=20WAL=20to=0A+=20=20=20=20#=20become=20available=20at=20seg=20N+1'=20= =E2=80=94=20flaky=20slot=20creation.=0A+=20=20=20=20my=20$phase1_seg=20=3D= =20$node_primary->safe_psql('postgres',=0A+=20=20=20=20=20=20=20=20= "SELECT=20pg_walfile_name(pg_current_wal_lsn())");=0A+=20=20=20=20= $node_primary->safe_psql('postgres',=20"SELECT=20pg_switch_wal();");=0A+=20= =20=20=20$node_primary->safe_psql('postgres',=20"SELECT=20= pg_log_standby_snapshot();");=0A+=20=20=20=20= $node_primary->safe_psql('postgres',=20"SELECT=20pg_switch_wal();");=0A+=20= =20=20=20$node_primary->poll_query_until('postgres',=20qq[=0A+=20=20=20=20= =20=20=20=20SELECT=20last_archived_wal=20IS=20NOT=20NULL=0A+=20=20=20=20=20= =20=20=20=20=20=20AND=20last_archived_wal=20>=3D=20'$phase1_seg'=0A+=20=20= =20=20=20=20=20=20FROM=20pg_stat_archiver=0A+=20=20=20=20])=20or=20die=20= "Timed=20out=20waiting=20for=20phase-1=20segment=20$phase1_seg=20to=20= archive";=0A+=0A+=20=20=20=20return=20($node_primary,=20$backup_name);=0A= +}=0A+=0A+#=20Bring=20up=20an=20archive-only=20standby=20from=20= $backup_name=20on=20$node_primary=0A+#=20with=20= recovery_pause_on_logical_slot_conflict=20set=20to=20$guc_value.=20Waits=0A= +#=20for=20replay=20to=20catch=20up,=20then=20returns=20the=20node.=0A= +sub=20create_archive_standby=0A+{=0A+=20=20=20=20my=20($node_primary,=20= $backup_name,=20$name,=20$guc_value)=20=3D=20@_;=0A+=0A+=20=20=20=20my=20= $standby=20=3D=20PostgreSQL::Test::Cluster->new($name);=0A+=20=20=20=20= $standby->init_from_backup($node_primary,=20$backup_name,=0A+=20=20=20=20= =20=20=20=20has_streaming=20=3D>=200,=20has_restoring=20=3D>=201);=0A+=20= =20=20=20$standby->append_conf('postgresql.conf',=20qq[=0A+hot_standby=20= =3D=20on=0A+recovery_pause_on_logical_slot_conflict=20=3D=20$guc_value=0A= +wal_level=20=3D=20logical=0A+max_standby_archive_delay=20=3D=20-1=0A= +max_standby_streaming_delay=20=3D=20-1=0A+]);=0A+=20=20=20=20= $standby->start;=0A+=20=20=20=20$standby->poll_query_until('postgres',=0A= +=20=20=20=20=20=20=20=20"SELECT=20pg_last_wal_replay_lsn()=20IS=20NOT=20= NULL",=20't');=0A+=0A+=20=20=20=20return=20$standby;=0A+}=0A+=0A+#=20= Churn=20the=20primary's=20catalog=20enough=20to=20emit=20catalog-prune=20= WAL=20records,=0A+#=20then=20force=20and=20wait=20for=20those=20records=20= to=20reach=20the=20archive.=0A+sub=20run_catalog_churn=0A+{=0A+=20=20=20=20= my=20($node_primary)=20=3D=20@_;=0A+=0A+=20=20=20=20#=20Transient=20= tables=20exercise=20pg_class=20/=20pg_attribute=20/=20pg_type=20/=20= pg_depend.=0A+=20=20=20=20$node_primary->safe_psql('postgres',=20qq[=0A+=20= =20=20=20=20=20=20=20INSERT=20INTO=20events=20(payload)=0A+=20=20=20=20=20= =20=20=20=20=20=20=20SELECT=20'row-'=20||=20g=20FROM=20= generate_series(1,=203000)=20g;=0A+=20=20=20=20]);=0A+=20=20=20=20for=20= (my=20$i=20=3D=200;=20$i=20<=2020;=20$i++)=20{=0A+=20=20=20=20=20=20=20=20= $node_primary->safe_psql('postgres',=0A+=20=20=20=20=20=20=20=20=20=20=20= =20"CREATE=20TABLE=20churn_$i=20(id=20int,=20payload=20text);=20DROP=20= TABLE=20churn_$i;");=0A+=20=20=20=20}=0A+=20=20=20=20#=20Two=20ANALYZE=20= calls=20make=20first-generation=20pg_statistic=20rows=20dead=20by=0A+=20=20= =20=20#=20overwriting=20them;=20VACUUM=20then=20emits=20= Heap2/PRUNE_ON_ACCESS.=0A+=20=20=20=20= $node_primary->safe_psql('postgres',=20qq[=0A+=20=20=20=20=20=20=20=20= ANALYZE=20events;=0A+=20=20=20=20=20=20=20=20ANALYZE=20events;=0A+=20=20=20= =20=20=20=20=20VACUUM=20pg_class;=0A+=20=20=20=20=20=20=20=20VACUUM=20= pg_attribute;=0A+=20=20=20=20=20=20=20=20VACUUM=20pg_type;=0A+=20=20=20=20= =20=20=20=20VACUUM=20pg_depend;=0A+=20=20=20=20=20=20=20=20VACUUM=20= pg_statistic;=0A+=20=20=20=20]);=0A+=0A+=20=20=20=20my=20$phase2_seg=20=3D= =20$node_primary->safe_psql('postgres',=0A+=20=20=20=20=20=20=20=20= "SELECT=20pg_walfile_name(pg_current_wal_lsn())");=0A+=20=20=20=20= $node_primary->safe_psql('postgres',=20"SELECT=20pg_switch_wal();");=0A+=20= =20=20=20$node_primary->poll_query_until('postgres',=20qq[=0A+=20=20=20=20= =20=20=20=20SELECT=20last_archived_wal=20IS=20NOT=20NULL=0A+=20=20=20=20=20= =20=20=20=20=20=20AND=20last_archived_wal=20>=3D=20'$phase2_seg'=0A+=20=20= =20=20=20=20=20=20FROM=20pg_stat_archiver=0A+=20=20=20=20])=20or=20die=20= "Timed=20out=20waiting=20for=20phase-2=20segment=20$phase2_seg=20to=20= archive";=0A+=0A+=20=20=20=20return;=0A+}=0A+=0A+#=20Orchestrator=20loop=20= for=20the=20GUC-on=20standby:=20when=20replay=20pauses,=20drain=0A+#=20= the=20slot=20via=20pg_logical_slot_get_changes=20and=20call=0A+#=20= pg_wal_replay_resume().=20Exits=20when=20replay=20stops=20advancing=20or=20= when=0A+#=20$deadline_seconds=20have=20passed.=20Returns=20= ($pauses_seen,=20$total_drained)=0A+#=20and=20includes=20a=20final=20= drain=20of=20anything=20left=20on=20the=20slot.=0A+sub=20= drain_and_resume_loop=0A+{=0A+=20=20=20=20my=20($standby,=20$slot_name,=20= $deadline_seconds)=20=3D=20@_;=0A+=0A+=20=20=20=20my=20$total_drained=20= =3D=200;=0A+=20=20=20=20my=20$pauses_seen=20=3D=200;=0A+=20=20=20=20my=20= $last_replay=20=3D=20'';=0A+=20=20=20=20my=20$stall_ticks=20=3D=200;=0A+=20= =20=20=20my=20$deadline=20=3D=20time()=20+=20$deadline_seconds;=0A+=0A+=20= =20=20=20while=20(time()=20<=20$deadline)=20{=0A+=20=20=20=20=20=20=20=20= my=20$state=20=3D=20$standby->safe_psql('postgres',=0A+=20=20=20=20=20=20= =20=20=20=20=20=20"SELECT=20pg_get_wal_replay_pause_state()");=0A+=20=20=20= =20=20=20=20=20my=20$replay=20=3D=20$standby->safe_psql('postgres',=0A+=20= =20=20=20=20=20=20=20=20=20=20=20"SELECT=20pg_last_wal_replay_lsn()");=0A= +=0A+=20=20=20=20=20=20=20=20if=20($state=20eq=20'paused'=20||=20$state=20= eq=20'pause=20requested')=20{=0A+=20=20=20=20=20=20=20=20=20=20=20=20my=20= $got=20=3D=20$standby->safe_psql('postgres',=0A+=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20"SELECT=20COUNT(*)=20FROM=20= pg_logical_slot_get_changes('$slot_name',=20NULL,=20NULL)");=0A+=20=20=20= =20=20=20=20=20=20=20=20=20$total_drained=20+=3D=20$got;=0A+=20=20=20=20=20= =20=20=20=20=20=20=20$pauses_seen++;=0A+=20=20=20=20=20=20=20=20=20=20=20= =20$standby->safe_psql('postgres',=20"SELECT=20pg_wal_replay_resume()");=0A= +=20=20=20=20=20=20=20=20=20=20=20=20$stall_ticks=20=3D=200;=0A+=20=20=20= =20=20=20=20=20}=20elsif=20($replay=20eq=20$last_replay)=20{=0A+=20=20=20= =20=20=20=20=20=20=20=20=20$stall_ticks++;=0A+=20=20=20=20=20=20=20=20=20= =20=20=20last=20if=20$stall_ticks=20>=2010;=0A+=20=20=20=20=20=20=20=20}=20= else=20{=0A+=20=20=20=20=20=20=20=20=20=20=20=20$stall_ticks=20=3D=200;=0A= +=20=20=20=20=20=20=20=20}=0A+=0A+=20=20=20=20=20=20=20=20$last_replay=20= =3D=20$replay;=0A+=20=20=20=20=20=20=20=20usleep(500_000);=0A+=20=20=20=20= }=0A+=0A+=20=20=20=20my=20$final=20=3D=20$standby->safe_psql('postgres',=0A= +=20=20=20=20=20=20=20=20"SELECT=20COUNT(*)=20FROM=20= pg_logical_slot_get_changes('$slot_name',=20NULL,=20NULL)");=0A+=20=20=20= =20$total_drained=20+=3D=20$final;=0A+=0A+=20=20=20=20return=20= ($pauses_seen,=20$total_drained);=0A+}=0A+=0A+#=20Poll=20until=20= $standby=20reports=20replay=20as=20paused,=20up=20to=20~30=20seconds.=0A= +#=20Returns=201=20on=20success,=200=20on=20timeout.=0A+sub=20= wait_for_replay_paused=0A+{=0A+=20=20=20=20my=20($standby)=20=3D=20@_;=0A= +=0A+=20=20=20=20for=20(my=20$i=20=3D=200;=20$i=20<=2060;=20$i++)=20{=0A= +=20=20=20=20=20=20=20=20my=20$s=20=3D=20$standby->safe_psql('postgres',=0A= +=20=20=20=20=20=20=20=20=20=20=20=20"SELECT=20= pg_get_wal_replay_pause_state()");=0A+=20=20=20=20=20=20=20=20return=201=20= if=20$s=20eq=20'paused';=0A+=20=20=20=20=20=20=20=20usleep(500_000);=0A+=20= =20=20=20}=0A+=20=20=20=20return=200;=0A+}=0A+=0A+#=20= ---------------------------------------------------------------------=0A= +#=20Main=20script=0A+#=20= ---------------------------------------------------------------------=0A= +=0A+#=201.=20GUC=20visibility.=0A+my=20($node_primary,=20$backup_name)=20= =3D=20setup_primary_with_clean_archive();=0A+=0A+my=20$guc=20=3D=20= $node_primary->safe_psql('postgres',=0A+=20=20=20=20"SELECT=20COUNT(*)=20= FROM=20pg_settings=20WHERE=20name=20=3D=20= 'recovery_pause_on_logical_slot_conflict'");=0A+is($guc,=20'1',=20= 'recovery_pause_on_logical_slot_conflict=20GUC=20is=20registered');=0A+=0A= +#=202.=20Phase=201:=20bring=20up=20BOTH=20standbys=20(GUC-on=20and=20= GUC-off)=20while=20the=0A+#=20archive=20still=20contains=20only=20the=20= quiet-moment=20snapshot=20=E2=80=94=20no=20prune=0A+#=20records=20yet.=20= Slot=20creation=20reaches=20SNAPBUILD_CONSISTENT=20quickly=20on=0A+#=20= both.=20Later,=20when=20Phase=202=20ships=20the=20prune=20records,=20the=20= two=20standbys=0A+#=20diverge:=20the=20GUC-on=20one=20pauses=20and=20= drains;=20the=20GUC-off=20one=0A+#=20invalidates.=0A+my=20$node_standby=20= =3D=20create_archive_standby($node_primary,=20$backup_name,=0A+=20=20=20=20= 'standby',=20'on');=0A+my=20$node_standby_off=20=3D=20= create_archive_standby($node_primary,=20$backup_name,=0A+=20=20=20=20= 'standby_off',=20'off');=0A+=0A+$node_standby->safe_psql('postgres',=20= qq[=0A+=20=20=20=20SELECT=20pg_create_logical_replication_slot('t_slot',=20= 'test_decoding');=0A+]);=0A+$node_standby_off->safe_psql('postgres',=20= qq[=0A+=20=20=20=20SELECT=20= pg_create_logical_replication_slot('t_slot_off',=20'test_decoding');=0A= +]);=0A+=0A+my=20$slot_ready=20=3D=20= $node_standby->safe_psql('postgres',=20qq[=0A+=20=20=20=20SELECT=20= wal_status=20FROM=20pg_replication_slots=20WHERE=20slot_name=20=3D=20= 't_slot'=0A+]);=0A+is($slot_ready,=20'reserved',=20"slot=20created=20= cleanly=20in=20Phase=201=20(state:=20$slot_ready)");=0A+=0A+my=20= $off_slot_ready=20=3D=20$node_standby_off->safe_psql('postgres',=20qq[=0A= +=20=20=20=20SELECT=20wal_status=20FROM=20pg_replication_slots=20WHERE=20= slot_name=20=3D=20't_slot_off'=0A+]);=0A+is($off_slot_ready,=20= 'reserved',=0A+=20=20=20"baseline=20slot=20created=20cleanly=20in=20= Phase=201=20(state:=20$off_slot_ready)");=0A+=0A+#=203.=20Phase=202:=20= catalog=20churn=20on=20primary,=20then=20wait=20for=20archive.=0A= +run_catalog_churn($node_primary);=0A+=0A+#=204.=20Orchestrator=20loop=20= on=20the=20GUC-on=20standby.=0A+my=20($pauses_seen,=20$total_drained)=20= =3D=0A+=20=20=20=20drain_and_resume_loop($node_standby,=20't_slot',=20= 60);=0A+=0A+my=20$slot_state=20=3D=20= $node_standby->safe_psql('postgres',=20qq[=0A+=20=20=20=20SELECT=20= wal_status=20||=20'|'=20||=20COALESCE(invalidation_reason,=20'')=0A+=20=20= =20=20FROM=20pg_replication_slots=20WHERE=20slot_name=20=3D=20't_slot';=0A= +]);=0A+like($slot_state,=20qr/^reserved\|/,=0A+=20=20=20=20=20"slot=20= survived=20catalog=20prune=20with=20GUC=20on=20(state:=20$slot_state)");=0A= +=0A+cmp_ok($pauses_seen,=20'>=3D',=201,=0A+=20=20=20=20"at=20least=20= one=20pause=20event=20was=20handled=20($pauses_seen=20seen)");=0A+=0A= +cmp_ok($total_drained,=20'>=3D',=202000,=0A+=20=20=20=20"at=20least=20= 2000=20decoded=20events=20($total_drained=20got)");=0A+=0A+#=205.=20= Baseline=20assertion:=20the=20GUC-off=20standby,=20faced=20with=20the=20= exact=20same=0A+#=20Phase-2=20archive,=20should=20invalidate=20its=20= slot.=20This=20confirms=20the=20test=0A+#=20setup=20actually=20triggers=20= the=20conflict=20AND=20that=20GUC-off=20behavior=20is=0A+#=20unchanged=20= from=20upstream=20=E2=80=94=20if=20this=20ever=20starts=20passing=20with=20= state=0A+#=20"reserved",=20either=20the=20test=20stopped=20reproducing=20= the=20trigger=20or=20the=0A+#=20GUC-off=20path=20accidentally=20benefits=20= from=20our=20patch.=0A+my=20$off_state=20=3D=20'reserved';=0A+for=20(my=20= $i=20=3D=200;=20$i=20<=2060;=20$i++)=20{=0A+=20=20=20=20$off_state=20=3D=20= $node_standby_off->safe_psql('postgres',=20qq[=0A+=20=20=20=20=20=20=20=20= SELECT=20wal_status=20FROM=20pg_replication_slots=20WHERE=20slot_name=20= =3D=20't_slot_off';=0A+=20=20=20=20]);=0A+=20=20=20=20last=20if=20= $off_state=20eq=20'lost';=0A+=20=20=20=20usleep(500_000);=0A+}=0A+=0A= +is($off_state,=20'lost',=0A+=20=20=20"baseline=20(GUC=20off):=20slot=20= invalidates=20as=20expected=20under=20catalog=20prune");=0A+=0A+#=206.=20= Promote-during-pause:=20bring=20up=20a=20third=20standby,=20get=20it=20= paused=20by=0A+#=20the=20GUC,=20then=20call=20pg_promote()=20and=20= assert=20promotion=20actually=0A+#=20completes=20(rather=20than=20= stalling=20until=20someone=20also=20runs=0A+#=20pg_wal_replay_resume).=20= Guards=20the=20CheckForStandbyTrigger()=20escape=0A+#=20path=20in=20the=20= wait=20loop.=0A+my=20$node_standby_p=20=3D=20= create_archive_standby($node_primary,=20$backup_name,=0A+=20=20=20=20= 'standby_promote',=20'on');=0A+$node_standby_p->safe_psql('postgres',=0A= +=20=20=20=20"SELECT=20= pg_create_logical_replication_slot('promote_slot',=20'test_decoding')");=0A= +=0A+#=20Phase-2=20archive=20is=20already=20shipped=20so=20a=20pause=20= will=20happen=20within=20a=0A+#=20few=20seconds.=0A+my=20$paused=20=3D=20= wait_for_replay_paused($node_standby_p);=0A+ok($paused,=20"promote-test=20= standby=20reached=20paused=20state=20before=20promotion");=0A+=0A+#=20= Call=20pg_promote=20with=20a=20short=20wait.=20Without=20the=20= CheckForStandbyTrigger=0A+#=20escape=20in=20the=20wait=20loop,=20this=20= stalls=20for=20the=20full=20wait_seconds=20and=0A+#=20returns=20false;=20= with=20the=20fix,=20it=20returns=20true=20in=20~1=20second.=0A+my=20$t0=20= =3D=20time();=0A+my=20$promoted=20=3D=20= $node_standby_p->safe_psql('postgres',=0A+=20=20=20=20"SELECT=20= pg_promote(wait=20=3D>=20true,=20wait_seconds=20=3D>=2030)");=0A+my=20= $elapsed=20=3D=20time()=20-=20$t0;=0A+is($promoted,=20't',=20"pg_promote=20= returned=20true=20while=20standby=20was=20paused=20by=20GUC");=0A= +cmp_ok($elapsed,=20'<',=2010,=0A+=20=20=20=20"pg_promote=20completed=20= in=20under=2010s=20(actual:=20${elapsed}s)");=0A+=0A= +$node_standby_p->stop;=0A+$node_standby_off->stop;=0A= +$node_standby->stop;=0A+$node_primary->stop;=0A+=0A+done_testing();=0A= --=20=0A2.50.1=20(Apple=20Git-155)=0A=0A= --Apple-Mail=_7CF3B2EF-A258-40FC-8984-55A2CDC2E2FE--