Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVLMW-001tZl-2B for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 03:36:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVLMV-009pRg-1N for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 03:36:47 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVLMU-009pRW-33 for pgsql-hackers@lists.postgresql.org; Fri, 05 Jun 2026 03:36:47 +0000 Received: from mail-pj1-x1033.google.com ([2607:f8b0:4864:20::1033]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wVLMT-00000001AoF-0SLN for pgsql-hackers@postgresql.org; Fri, 05 Jun 2026 03:36:46 +0000 Received: by mail-pj1-x1033.google.com with SMTP id 98e67ed59e1d1-36b903567fdso1340360a91.1 for ; Thu, 04 Jun 2026 20:36:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780630604; cv=none; d=google.com; s=arc-20240605; b=gD+8H2dzUhCiYeSGwMKcQPjslFIUJl8hpoMdJJ4SBRi93VvLyLUqAic2IJT1LvMWxq gwA8e8wJ4EZJ6/MFblm4beW2EUXy9CyI7/gRuhVkQ21GzJ3y2A3EgOD+qHmRcK55ReXd nmznYTs5emVFug3b6YAYiYZ6HuGSWDMZdpnM0nJUD54kKVoQoZTyTu92iIH/ieVV48QP PtvwSuxzULX5Pr/3lWD39onOt0LvrcvMiCmm6Q7d/rq9jb0FvQonKuKez0p7TLlgdVe5 M6iXKxshkugTA3djajj0R7Xh+RoFpZS/mJpMMgfin/kqO16gJ0w9X1803vLZa4O3xyQ+ BXaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=HFNUhr9TmqYR/65BJjOdFqmJZHnFKMTcM5SRB9OG8m4=; fh=RKkdweiK6dcpjkuGRzWSVX5VItYHe/oeYfWucepVfN8=; b=hsXayrpexlvgEhgCXUt3wdatP4UY8+lZ3gGfle6uM+bfgkFwgXOTdzmY5hpbRVqvx6 WBGdCTqodzUJUu6dKH78RblLvVZAIaOfXJtUDYzE1wY2S77a0l+6T8qR4N/Qft5ekviG JdfkofFZeCI1pBYQEmX+UFcicqVra8Z1SoNU9BmwRJpdxpr2PaYyAchtXaCnACJ+hcdS YEGEpJyrVOyP0+93EJmTuYlcBOv/HdIEsrkBf84S9Kbyal5bJebE1fgjJTZW/d7fLh9+ jqKReYGcU6q2W5K8Y4ERO5fOhyNJm+bdK8VfEqyodgMMz2qccWFmne5+ZhzP3vc5dgB6 1Bog==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780630604; x=1781235404; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HFNUhr9TmqYR/65BJjOdFqmJZHnFKMTcM5SRB9OG8m4=; b=dpCa69Cs3FR6qpdsmFVhJsgfu0h0o+nh+je7IVhrFSLareYAd10v5nfd+Td6Cf5tck JMlBsdpUBOl9n98dS5GqdZ97TD8zTIISrkSvW7gDSynjLYxnI6rPXPjLOw1VTBSBnSUE OwcjO7zVpOoksMux4PbLgEQ47tSmx2hBaSXp69WI7laOstInjqoGb2yAeKxckaYwr+qo gpik7XbSbFbuVS+k0bzf9aiooTOi2M291TlQgI4XGTKyKjXqvPPABRYxWtJ1OudiP5p9 InSPn5h4h1az6MfGJ+he5f0GnbUhlsEg1cF2/LUwhoN/fjUXqXyeoMlzoBZTd0TaZSIm DV/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780630604; x=1781235404; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=HFNUhr9TmqYR/65BJjOdFqmJZHnFKMTcM5SRB9OG8m4=; b=V0KemGH/TAlI3sIx2Zi5mc4OJXdPF+43r+nIOX2CQOhUfQeDn3HQOOiABiYaiCQGEs DaTt1nfmA2YNwRRcgDCHoUq4YjXrFoN6uywW7Vu235JyT9a5CuziAh8Qt/b1vI/apqF6 R6h426kh9SB1QOGFmzkduYd+LIcOLRAzQEFIG0yN3R2/imVcT2FFM4b59subHOlr6zJV IWJlbRSyX9sfHAncULokUgXxKY9hFUgRFv9SLiifSHj10P/mzhHKwbq1YKwhXjQxcKG2 tqLg83ky56n2ym54KCkSXLhS/2xhyhFHjGTcisN7uLwx63DOMEaFPBKsPvaeS2Gy/VCr cN2Q== X-Forwarded-Encrypted: i=1; AFNElJ92wQDbE5ggvHRa7B+LOacGf9zblI6OPUJwNvNG1+mDEx/xs3vTOf576p1bd3TDnvS5GMZch6DlBVMzg975@postgresql.org X-Gm-Message-State: AOJu0YyzEWvfcMZCABC7teZZroDM9ePtoDXnPEuawn9suSchc+e0a3ag LqeD6cM9t45P5HLfDZoXEv2WcxGnEkY0UB3j2BBD1MMK55p1XhSkbEC7wvYEKcMRvGmyNgcTc/z jUTlFbD9JyQ5SdAl2L1ocRYFyIwgcr9w= X-Gm-Gg: Acq92OFbkteMTp93LFBTs6vXhoecUi1ujZRl5ltjROu+QHlVZ9ctvi0twmcgwHCDbfY XCzg+68BDHptVKL619Ndjd3PwuDt/SbZ5YDd46Y7+mRQupILwe03MaYO+Ubwsd2fGklnKWr/iJ6 eLo3reAORP9JY0tA6jbyrl8+/DjzUhxDYzkHQzGKcZYLHxfi1L/IWJsjlqGBPfTAg/KC+8ryJH7 4r0I5XmvCaOMf9QKlsY+SKKhwOPrYxGGjDg3oI65dvlsJNrkHr0PDEW2fpjTsDnzJP+JClyBFKA 6Y4lQJXnqKsb+gj+ZRONw4LSiZ9bHjiCTQjOcyXFEat+egYvGolyPdbMgnB01r0= X-Received: by 2002:a05:6a20:1609:b0:39f:3ca8:a33b with SMTP id adf61e73a8af0-3b4ccd764e8mr2212128637.17.1780630603840; Thu, 04 Jun 2026 20:36:43 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: shveta malik Date: Fri, 5 Jun 2026 09:06:32 +0530 X-Gm-Features: AVHnY4INuLzzb7_yyzu0JlTRQWCBHyiTb12bcQhG4Jd-nMW-gUs0Z72LkO5oueA Message-ID: Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication To: "Zhijie Hou (Fujitsu)" Cc: Ashutosh Sharma , Amit Kapila , Ajin Cherian , SATYANARAYANA NARLAPURAM , PostgreSQL-development , PostgreSQL Hackers , shveta malik Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Jun 5, 2026 at 8:34=E2=80=AFAM Zhijie Hou (Fujitsu) wrote: > > On Thursday, June 4, 2026 5:27 PM Ashutosh Sharma = wrote: > > On Thu, Jun 4, 2026 at 1:54=E2=80=AFPM Zhijie Hou (Fujitsu) > > wrote: > > > > > > On Thursday, June 4, 2026 3:36 PM Ashutosh Sharma > > wrote: > > > > On Thu, Jun 4, 2026 at 9:14=E2=80=AFAM shveta malik > > > > wrote: > > > > > My preference, and original intent, was to accept duplicate entri= es > > > > > and skip them internally. Doc can be updated to say 'duplicate en= tries > > > > > are skipped'. A server startup failure due to duplicate entries i= n a > > > > > GUC does not seem right to me. If the alter-system command fails = due > > > > > to duplicate entries, that is still fine, but a startup failure s= eems > > > > > excessive. But let's see what others have to say on this. > > > > > > > > > > > > > Okay, the attached patch adds the capability to automatically remov= e > > > > duplicate entries from the synchronized_standby_slots list. > > > > > > Thanks for updating the patch. > > > > > > I agree with Shveta that reporting an ERROR is not ideal. I also thin= k it (ERROR) would > > > be inconsistent with existing GUCs, as most of them, such as > > > synchronous_standby_names, search_path, and session_preload_libraries= , do not > > > enforce uniqueness. > > > > > > The most similar GUC, synchronous_standby_names, also clarifies this = in the > > > documentation: > > > > > > " There is no mechanism to enforce uniqueness of standby name= s. In case of > > > duplicates one of the matching standbys will be considered as= higher priority, > > > though exactly which one is indeterminate."[1] > > > > > > > In N of M > > > > mode, if N > M after removing duplicate entries, an error is raised= . > > > > > > I'm not entirely sure about this case. It seems similar to when the n= umber of > > > specified slots is less than N (in ANY N or FIRST N), given that we w= ant to > > skip > > > duplicate slots. In that situation, the natural behavior to me would = be to > > > simply block replication rather than raise an error. And > > > synchronous_standby_names would also simply block the transaction in = this > > case. > > > > > > > For duplicate entries themselves, I agree with the direction of not > > raising an error. Silently normalizing duplicates is reasonable for > > this GUC, especially if we document it clearly. A repeated slot name > > does not add any new information, so treating it as =E2=80=9Csame slot = listed > > twice by mistake=E2=80=9D is practical. > > > > But for N > M after deduplication, I would still lean toward raising an= error. > > > > Why I=E2=80=99d separate those cases: > > > > 1) Duplicate entries looks like a harmless normalization problem. ANY > > 2 (a, a, b) can be normalized to ANY 2 (a, b) without changing the > > user=E2=80=99s apparent intent much. > > > > 2) N > M after deduplication is not a transient runtime state. ANY 2 > > (a, a) becomes one unique slot. That configuration can never succeed > > unless the config itself changes. Blocking forever turns a static > > configuration mistake into an operational liveness problem. > > > > 3) N > M after deduplication is different from ordinary =E2=80=9Cnot en= ough > > standbys are currently available=E2=80=9D. If we configure ANY 2 (a, b)= and > > only a is currently caught up, blocking makes sense because the > > situation may resolve at runtime. If we configure ANY 2 (a, a) and > > duplicates are ignored, there is no possible future runtime in which > > it succeeds without editing the GUC. That is why I think erroring is > > better. > > > > On the synchronous_standby_names comparison, I do not think it is > > fully analogous. The quoted documentation is about there being no > > reliable way to enforce uniqueness of standby names in the live > > system, because those names are matched against runtime standbys and > > the result can be indeterminate. Here, synchronized_standby_slots > > names concrete replication slots, which are stable object identifiers. > > Duplicate config entries are detectable and normalizable > > deterministically at GUC parse time. That gives us a cleaner option > > than synchronous_standby_names has. > > Thanks for the explanation. > > What I was wondering is: ignoring duplicates, what should be the behavior= of > "ANY 2 (standby)" when N > M? > > I studied a bit for the behavior of synchronous_standby_names to understa= nd the > difference. synchronous_standby_names does support syntax like "ANY 2 (st= andby)" > where N > M. Because even in that case, a transaction can still commit if= there > are two standbys with the same name ("standby" in this example). I'm not = sure > how common that use case is, but it may explain why no error is reported. > > Given that, I'm not opposed to reporting an error in synchronized_standby= _slots > when N > M. The situation is different here since there cannot be two slo= ts with > the same name, making this a completely invalid use case. > I also think, we can report error when N>M. IIRC, we were also reporting earlier (without removing duplicates). Upon removing duplicates, we can follow the same behaviour instead of walsender being stuck indefinitely. thanks Shveta