public inbox for [email protected]
help / color / mirror / Atom feedFrom: SATYANARAYANA NARLAPURAM <[email protected]>
To: Jim Jones <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Add max_wal_replay_size connection parameter to libpq
Date: Sun, 29 Mar 2026 11:31:49 -0700
Message-ID: <CAHg+QDeO4cbOKJSx0+yzD00a_1xv_z9dLrE4nKkxb1A+KEoKeg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
Hi Jim,
On Sun, Mar 29, 2026 at 10:56 AM Jim Jones <[email protected]>
wrote:
> Hi,
>
> When connecting with target_session_attrs=standby (or prefer-standby,
> read-only, any) and multiple standbys are available, libpq currently
> selects the first acceptable candidate without regard for how "current"
> its data is. A standby configured with recovery_min_apply_delay,
> experiencing slow I/O, or otherwise lagging is treated the same as one
> that is fully caught up.
>
> I would like to propose a new libpq connection parameter,
> max_wal_replay_size, that allows clients to skip standby servers whose
> WAL replay backlog exceeds a given threshold.
>
> Example:
>
> psql "host=host1,host2,host3 port=5111,5222,5333 \
> target_session_attrs=standby max_wal_replay_size=16MB"
>
> When this parameter is set, libpq executes a small query during
> connection establishment to evaluate:
>
> pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())
>
> on the standby. If the result exceeds the specified threshold, the
> server is skipped and the next host in the list is tried. The check is
> skipped entirely when target_session_attrs is set to primary or
> read-write, since those modes already exclude standbys.
>
What if none of them meets the criteria? You fail the connection? Wouldn't
it cause an availability issue?
>
> If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
> missing primary_conninfo or a disconnected upstream), the backlog cannot
> be determined. In that case, the standby is treated as exceeding the
> threshold and is skipped.
>
When a standby is replaying archiving log, it can still be caught up. This
doesn't seem right to me.
>
> This parameter measures only the apply lag on the standby itself, i.e.,
> how much already-received WAL remains to be replayed. It does not
> attempt to measure how far the standby is behind the primary. In
> particular, a standby that is slow to receive WAL but fast to replay it
> may report a small backlog here while still being significantly behind.
>
IMHO, this change appears to not meet the objective of routing
connections/queries to the most up-to-date standby.
Thanks,
Satya
view thread (4+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Add max_wal_replay_size connection parameter to libpq
In-Reply-To: <CAHg+QDeO4cbOKJSx0+yzD00a_1xv_z9dLrE4nKkxb1A+KEoKeg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox