Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvTSb-003lfk-0U for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Feb 2026 04:58:49 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvTSa-009lms-01 for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Feb 2026 04:58:48 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvTSZ-009lmj-1y for pgsql-hackers@lists.postgresql.org; Thu, 26 Feb 2026 04:58:47 +0000 Received: from mail-dl1-x1235.google.com ([2607:f8b0:4864:20::1235]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vvTSW-00000001Ecr-1Uya for pgsql-hackers@lists.postgresql.org; Thu, 26 Feb 2026 04:58:46 +0000 Received: by mail-dl1-x1235.google.com with SMTP id a92af1059eb24-127380532eeso335148c88.1 for ; Wed, 25 Feb 2026 20:58:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772081925; cv=none; d=google.com; s=arc-20240605; b=aqrYT2jzQzJzLysu08gIB8U97SjDj8yzXFywfcbAluCGS2OpBGv48hwT1QBRVU67Z9 cM5bcY2XBWoUrM5nZrtwbqsV3tUzjOvZQtZ2pWSPWAhbQbarCmghM5EKM2mMMS/TmAQD TE6OS381sJe4JIYzcSI6jMinfHVuwt0iiYyb1+ZGMcWf0I5KsGpvcaXp+giWCTY/LWo9 4SARTMKlACPKEl6+XiCBXHMaeLz8h6OoGhVtOdvgafGFQJGoE3uB9srCyUIK0gq2t4nH 0/AzBVbIYwsAk+o0ah7AaiglO26gLQaK3E6/N6RTRSGYKU7JzrHulXArbupCl8DC+/yI Bqbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=mkqrQ1pZN8IhKrSfRd19n9rb7I1g3AVwkfs6Imia+sg=; fh=0fLNZtBypVmo/zbkmHyztaPz8IjoYiN3H+gM3MIB6Yo=; b=L1Bbt7iFYuVPz7CRZkUSCW+ac2/HulpfwsO5rvsUgw3e3JnpJwjJzR7WUssHZ+zpD+ yzsV4HL3ufOd3uPr1qUTlxNOJt9LOpa9PSdVx7ARKSxGPKVsyVyS5yloHsVWfTUAWn+T ytKF0N4ye678AbRbqf3djgNLTCa616LasemV3X/HQL2xUVWY/xU8GOh6GcAvoK5wZlyk tC+H52VS816jkTaa03g+RZoSyujSe7BNIRSP2vbcFfqshMw20cFa2nrB315PlnJny4Ax sVdyvztiXlHOqQ94czZp/MwlQKqET9QGccC5jSTr1fBaI2hktKfbWbuKk7e9EYuQl8cf ZTsg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772081925; x=1772686725; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mkqrQ1pZN8IhKrSfRd19n9rb7I1g3AVwkfs6Imia+sg=; b=aVXdKQXsshKbmFIqSo9spYmW2EaToFHrIp4YUk+w2Co+OzYo+8myHZUazM2YNsxEIK o6pfm+CmT0dPXBWD5SJH1+uCht53maf07Zydg6Dh68X9EQbrQPaDESlD7VuJpwdk7mE+ iGqkhVEnCG5RWB2E0zO6aEGw5/ag8ww01qu81C9zSs1pbQOKbGhV4rltG8HBkklP7WG7 LLX7d8zF/3biLNV9DNSRLK0XhasKl3HoKxauEvCFdxNls41JifPzZQbfD8U8FdUr5ZCm CMzf6j2ugD1+5hQJosKruDW8vgivaNsOioVfWqrpJwB79j93FEzm5zguohCy3BNu2NpB ZgwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772081925; x=1772686725; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=mkqrQ1pZN8IhKrSfRd19n9rb7I1g3AVwkfs6Imia+sg=; b=dhlF8C7Wka5WKTfEydDtIh1VRARfxXlFgHe3AMlmYsyChjonS8HXVxqAlpcinxe2y7 371AfQN9utEpCsWcuu94nJPgGKOSStBxp+suGRH/99xNSL+Td2TYGQfe0M9DtX8gcsUS oIdiO7hBT2+E3+C6Zn8NDOhHqtVfCiBSz+PHOuf9LyDWROAa10ujy6KvYZXfg1GCNRUc Xm0diwJ0rIiFKP0tjE6ms1Wu3XmdFxpZTqMwf+HCTu20uteurwTR4GQ+NIX63+Zj3GWZ p8hSPLlwFN/TiPckHCfcXmsDTmz/Nl8Eb71PNq/TFTIonMfIj6+Voon2Ugqc6TYyycSk bQ8w== X-Forwarded-Encrypted: i=1; AJvYcCV6K1l+N4ZKyzaGVoBFkGslDnW7kR8At0i9+eGDY9k04R86ywOJwO2EJGmdNK7F3ekvhRLp13/QEv6mZL0E@lists.postgresql.org X-Gm-Message-State: AOJu0YyoEvGCvNoyLPmzZGIFKTh5kzM8qVgwRYUShyNt6sS92tg7sEr5 tC6PN1NevUzej3wv6cRV1Hn3YoxGYi8PYGOouRoc/jJTBa4obPimo2OyydI/jOL7K1Z+z1xT9cU 4AxZnv6RMPhW3elBoyBLiN8WDPXq6qzE= X-Gm-Gg: ATEYQzyRiOkDOU/4AjW0o+DoMxOHvT4kyE3OczmPXJ5kIN9gpxzCwWNhRCm5D+T5Kd5 LMMz0ZfeAKbUA6iCXtbvP++55FC0IwbhYtndNu61h9rrGunrVnkyLP0uKCUb9EVJcAVcDxtYqYG kvaIIPXp2sWiRq5FXF6kuspU49US+kUAbEYPxSE0+sfjchG7x8NYKjaiwbx0r8ECHfTmFJhkYsG gCnqktAFhv+PNpQFzOzlCebdatdg+akN6+ZlbOMIzkFpsBRTQdtOs4169p4cfsL+2Grrs6Yzlj6 oVx/iCufv0qp+KNehL2SlN3Wh2EXgkvdAsxBHKiX X-Received: by 2002:a05:7022:10b:b0:11f:3483:bba6 with SMTP id a92af1059eb24-12789c6f28amr431135c88.13.1772081924597; Wed, 25 Feb 2026 20:58:44 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ashutosh Sharma Date: Thu, 26 Feb 2026 10:28:31 +0530 X-Gm-Features: AaiRm51slWC1llg3KyeotmL7Egp9TXY6PFNjp_DCt_f6xQ2EIyMJxCFzEZJJY6U Message-ID: Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication To: SATYANARAYANA NARLAPURAM Cc: PostgreSQL-development , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On Wed, Feb 25, 2026 at 7:21=E2=80=AFPM Ashutosh Sharma wrote: > > Hi Satya, > > On Wed, Feb 25, 2026 at 3:38=E2=80=AFAM SATYANARAYANA NARLAPURAM > wrote: > > > > > > Hi hackers, > > > > synchronized_standby_slots requires that every physical slot listed in = the GUC has caught up before a logical failover slot is allowed to proceed = with decoding. This is an ALL-of-N slots semantic. The logical slot avail= ability model does not align with quorum replication semantics set using sy= nchronous_standby_names which can be configured for quorum commit (ANY M of= N). > > > > In a typical 3 Node HA deployment with quorum sync rep: > > > > Primary, standby1 (corresponds to sb1_slot), standby2 (corresponds to s= b2_slot) > > synchronized_standby_slots =3D ' sb1_slot, sb2_slot' > > synchronous_standby_names =3D 'Any 1 ('standby1','standby2')' > > > > If standby1 goes down, synchronous commits still succeed because standb= y2 satisfies the quorum. However, logical decoding blocks indefinitely in W= aitForStandbyConfirmation(), waiting for sb1_slot (corresponds to standby1)= to catch up =E2=80=94 even though the transaction is already safely commit= ted on a quorum of synchronous standbys. This blocks logical decoding consu= mers from progressing and is inconsistent with the availability guarantee t= he DBA intended by choosing quorum commit. > > +1. This can indeed be a blocker for failover enabled logical > replication. It not only has the potential to disrupt logical > replication, but can also impact the primary server. Over time, it may > silently lead to significant WAL accumulation on the primary, > eventually causing disk-full scenarios and degrading the performance > of applications running on the primary instance. Therefore, I too > strongly believe this needs to be addressed to prevent such > potentially disruptive situations. > > > > > > > Proposal: > > > > Make synchronized_standby_slots quorum aware i.e. extend the GUC to acc= ept an ANY M (slot1, slot2, ...) syntax similar to synchronous_standby_name= s, so StandbySlotsHaveCaughtup() can return true when M of N slots (where M= <=3D N and M >=3D 1) have caught up. I still prefer two different GUCs for= this as the list of slots to be synchronized can still be different (for e= xample, DBA may want to ensure Geo standby to be sync before allowing the l= ogical decoding client to read the changes). I kept synchronized_standby_sl= ots parse logic similar to synchronous_standby_names to keep things simp= le. The default behavior is also not changed for synchronized_standby_slot= s. > > > > Thank you for the proposal. I can spend some time reviewing the > changes and help take this forward. I would also be happy to hear > others' thoughts and feedback on the proposal. > Thinking about this further, using quorum settings for synchronized_standby_slots can/will certainly result in at least one sync standby lagging behind the logical replica, making it probably impossible to continue with the existing logical replication setup after a failover to the standby that lags behind. Here is what I am mean: Let's say we have 2 synchronous standbys with "synchronized_standby_slots" configured as ANY 1 (sync_standby1, sync_standby2). With this quorum setting, WAL only needs to be confirmed by any one of the two standbys before it can be forwarded to the logical replica. Now consider a scenario where sync_standby1 is ahead of sync_standby2, new WAL gets confirmed by sync_standby1 and subsequently delivered to the logical replica. If sync_standby1 then goes down and we failover to sync_standby2, the new primary will be at a lower LSN than the logical replica, since sync_standby2 never received that WAL. At this point, the logical replication slot on the new primary is essentially stale, and the logical replication setup that existed before the failover cannot be resumed. Hence, I think it's important to ensure that the WAL (including all the necessary data needed for logical replication) gets delivered to all the servers/slots specified in synchronized_standby_slots before it gets delivered to the logical replica. While I agree that not allowing quorum like settings for this has the potential to accumulate WAL and impact logical replication, I think we can explore other ways to mitigate that concern separately. Let's see what experts have to say on this. -- With Regards, Ashutosh Sharma.