Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w6zux-004rHC-1q for pgsql-hackers@arkaria.postgresql.org; Sun, 29 Mar 2026 23:51:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w6zuw-0003UZ-0M for pgsql-hackers@arkaria.postgresql.org; Sun, 29 Mar 2026 23:51:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w6zuv-0003UR-2G for pgsql-hackers@lists.postgresql.org; Sun, 29 Mar 2026 23:51:42 +0000 Received: from mail-vk1-xa34.google.com ([2607:f8b0:4864:20::a34]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w6zut-00000001iV4-49bD for pgsql-hackers@lists.postgresql.org; Sun, 29 Mar 2026 23:51:41 +0000 Received: by mail-vk1-xa34.google.com with SMTP id 71dfb90a1353d-5637886c92aso1847738e0c.0 for ; Sun, 29 Mar 2026 16:51:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774828299; cv=none; d=google.com; s=arc-20240605; b=ORJW0uJKphrc7bzIep6AuPy/dRahIVjGxrM2ngCfL9nw4SWiTYcrht2uLcvq9O//Nv JC8CZROY/iJHW/7wd3TiSTeBukxy2UGAuDODjQF7zTz1x+VlQEPDK2m42MJb3J8itdsP I7bDFj9eGWLSIubI2uF9ZY+i0ZtQEqxbu22y+stkLAwy8T7Wsvg4uHfsXuQl7Q+1boEC rFW4HJXLi4r/0HPU4v8Ol5JpJmAlURExnSZ/P0d+Hy3Rbum0k41LSEOJZGa8IWCH8wwI RPOTWMaWx3UOD6ZS6Cj9Fsm5NvgUEL2s/TaaNnVJfmHQEaSA+ppc2JzQDWHcvHQN+7tT Rc/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=5dnoLlg96/bCNZT4iPhCe7wmG0gq63M/WC1V3DmkMUg=; fh=eGhD8NE58EmIgD6b+gPO+CvU80MO/Q+yy1QKus3+4M4=; b=VhvY66Lm4UWERbK76svtFsKm5il3mBETI9Xk+dZM/7Hnohoms638j7cA/Ib0DEgy9Y +z6MLZdtIXnYhz272vxvteI40QbX9182Ky1ePveh0aFSBhLBn1/k8Xb4vFmolIVFR8aK 0PZbk2zxB6bt3Q2+nfM24k3iV8HlP2+yiaSa/Ur3jKzf2Lv2Q10ZrrHTqOn47/4HQR5G Zyb1Q9dYrnPd13omNb3BcvgNb1Wzqo2oFs3QO1G8YH9CCtmk7HyTZ+l2hxf5gwC5S4ax tz8YzRVhNR5qOuoiG/bDL5kphbYj0m1zlanFxfPOwxvM7aXsk7+ho2TKuqUj941YE76B /Z+w==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774828299; x=1775433099; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5dnoLlg96/bCNZT4iPhCe7wmG0gq63M/WC1V3DmkMUg=; b=tKgMQXibmYQnm0RCoESlEoe2myuiHI6ahGV2Kap2hIVy5BgmIGJTS/2DliE4YnZ+mv 2wQZiQleRnhqXHdsps1ArcFtOxQ1xdS1dSBKxhGzdG6cTkNpJiI3vvsYEJlFHdZG5zT2 J6m41YJNGDasABN+vn548UowHRw7fwf2ovMgBSUFcDS9qbpaHZgxaxsKVlNjddz50mO4 7OQ1GNt3+cWatiMQ3w9ngDWT91jHt3OsaLkLOb8/UlerZJRqNElL+2ZvdOmd6SJVwi8L 38/Rzanrl1o5PasJhnpRTN1cSNeAa5l2GB2RmGAem1VMlfK0wzsAHqP5TNm3OqBs6G4k 3oHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774828299; x=1775433099; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5dnoLlg96/bCNZT4iPhCe7wmG0gq63M/WC1V3DmkMUg=; b=XHncu86TX6mXX9AhKafjuk2CH90tot8kCpAUnFYEN+o/+9vN2HljQ2o69W2b4F97F0 Lj9vx893bNePdaZaqs1uCj4DCA5l+oUWjdco8hf2s7TST/dsLBDUxVA58ZEb9NyEmDxP S5rx9yC1f53VBw0c3FTYi9FEoIi9T16lBuheDerL3jYiF3WXEBI7mGtCNP2vNSsc+y9t 07ZBTgHTvBr7J6d0wJZ1/HpVbmo36lYvhac5o7jYPxcCxfl9TNt5rB8w62HzorBpzHRu Q/J5BdmVf/6NazfUVwud2PwDnBDG/l95iRl1xr1rlwU9FbVX/711dTYJBRslUBFfsy6e LM/w== X-Gm-Message-State: AOJu0YzY3EZ8+w2pQWj8txYFMtvY78cZbSxgoVxvAnVk91pfycGlKsHt AmbwGzSa/CNb1iNDaYWSqAq2bYSDEIcZwcuU/b13xnIFo93Sb9En6QsAC2WKFNs9f+fzxf0zdtq ArWTbmyX8YXjJxdC14a7hmdNpWZXSfh0= X-Gm-Gg: ATEYQzyxFv5ga14ej5mG8Jl4I+0J2NBsZnqVp8ECO6lq/6pSaYzzxVjDSXxU0gROHme uNgT5dXID00AKA1EfvC7UYwf4a+lEKMH/R4x93HoWu9n2tnZh3WONeUtA2LkdLeeKWmKrWCi47Y /5c4u9uwu2dOq1vyyPoJ7Y9qEhqH+J3k8sj5V/Se89EyvjJraIe+EN3mRlObav3Y/st7ozzrkbN n0UfqNFBdO2MeuLP7A9Xqd49RAQ11luTCbvtt/Eo45KDQ8CpIij3oVYpa9B+0L18+aoxgMclXRT duziHH4= X-Received: by 2002:a05:6102:52:b0:605:23e7:30a4 with SMTP id ada2fe7eead31-60523e744f1mr935816137.27.1774828298790; Sun, 29 Mar 2026 16:51:38 -0700 (PDT) MIME-Version: 1.0 References: <126eb1e4-d98e-4647-b629-517adbcad28e@uni-muenster.de> In-Reply-To: From: SATYANARAYANA NARLAPURAM Date: Sun, 29 Mar 2026 16:51:27 -0700 X-Gm-Features: AQROBzCXCdAvkW_dB0F1hJ-6Zj18JI7eyHdGKsEQDksqLpZu_iqt1qQ7zhsglK4 Message-ID: Subject: Re: Add max_wal_replay_size connection parameter to libpq To: Jim Jones Cc: PostgreSQL Hackers Content-Type: multipart/alternative; boundary="000000000000496c40064e326aa9" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000496c40064e326aa9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, On Sun, Mar 29, 2026 at 11:53=E2=80=AFAM Jim Jones wrote: > > > On 29/03/2026 20:31, SATYANARAYANA NARLAPURAM wrote: > > What if none of them meets the criteria? You fail the connection? > > Wouldn't it cause an availability issue? > > > Yes, the connection fails if no host meets the threshold. This is > intentional, and it is consistent with the existing behaviour of > target_session_attrs: if you set target_session_attrs=3Dstandby and no > standby is reachable, the connection fails too. > > > > If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver > due to > > missing primary_conninfo or a disconnected upstream), the backlog > cannot > > be determined. In that case, the standby is treated as exceeding th= e > > threshold and is skipped. > > > > > > When a standby is replaying archiving log, it can still be caught up. > > This doesn't seem right to me. > > > I totally see your point here. The issue is that > pg_last_wal_receive_lsn() returns NULL when there is no WAL receiver > process -- regardless of how current the data actually is. Without a > receive LSN, the metric this parameter is based on (receive_lsn - > replay_lsn) is simply undefined for that standby. > > Please let me know if I am missing something here. > > > > > > This parameter measures only the apply lag on the standby itself, > i.e., > > how much already-received WAL remains to be replayed. It does not > > attempt to measure how far the standby is behind the primary. In > > particular, a standby that is slow to receive WAL but fast to repla= y > it > > may report a small backlog here while still being significantly > behind. > > > > > > IMHO, this change appears to not meet the objective of routing > > connections/queries to the most up-to-date standby. > > > The parameter's objective is not to route to the most up-to-date > standby; it is to skip standbys whose apply lag exceeds a given threshold= . > What is the expectation from such a routing? Is it for freshness of data for the client or freeing up the standby from user connections so that it can catch up with primary? The paragraph described originally was talking about the freshness. Thanks, Satya --000000000000496c40064e326aa9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

On Sun, Mar 29, 2026 at 11:5= 3=E2=80=AFAM Jim Jones <jim= .jones@uni-muenster.de> wrote:


On 29/03/2026 20:31, SATYANARAYANA NARLAPURAM wrote:
> What if none of them meets the criteria? You fail the connection?
> Wouldn't it cause an availability issue?


Yes, the connection fails if no host meets the threshold. This is
intentional, and it is consistent with the existing behaviour of
target_session_attrs: if you set target_session_attrs=3Dstandby and no
standby is reachable, the connection fails too.


>=C2=A0 =C2=A0 =C2=A0If pg_last_wal_receive_lsn() is NULL (e.g. no activ= e WAL receiver due to
>=C2=A0 =C2=A0 =C2=A0missing primary_conninfo or a disconnected upstream= ), the backlog cannot
>=C2=A0 =C2=A0 =C2=A0be determined. In that case, the standby is treated= as exceeding the
>=C2=A0 =C2=A0 =C2=A0threshold and is skipped.
>
>
> When a standby is replaying archiving log, it can still be caught up.<= br> > This doesn't seem right to me.


I totally see your point here. The issue is that
pg_last_wal_receive_lsn() returns NULL when there is no WAL receiver
process -- regardless of how current the data actually is. Without a
receive LSN, the metric this parameter is based on (receive_lsn -
replay_lsn) is simply undefined for that standby.

Please let me know if I am missing something here.


>
>=C2=A0 =C2=A0 =C2=A0This parameter measures only the apply lag on the s= tandby itself, i.e.,
>=C2=A0 =C2=A0 =C2=A0how much already-received WAL remains to be replaye= d. It does not
>=C2=A0 =C2=A0 =C2=A0attempt to measure how far the standby is behind th= e primary. In
>=C2=A0 =C2=A0 =C2=A0particular, a standby that is slow to receive WAL b= ut fast to replay it
>=C2=A0 =C2=A0 =C2=A0may report a small backlog here while still being s= ignificantly behind.
>
>
> IMHO, this change appears to not meet the objective of routing
> connections/queries to the most up-to-date standby.


The parameter's objective is not to route to the most up-to-date
standby; it is to skip standbys whose apply lag exceeds a given threshold.<= br>

What is the expectation from such a rou= ting? Is it for freshness of data for the client or=C2=A0
freeing= up the standby=C2=A0 from user connections so that it can catch up with pr= imary?
The paragraph described originally was talking about the f= reshness.
=C2=A0
Thanks,
Satya
--000000000000496c40064e326aa9--