public inbox for [email protected]
help / color / mirror / Atom feedFrom: Fujii Masao <[email protected]>
To: Shinya Kato <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Date: Tue, 10 Mar 2026 10:54:13 +0900
Message-ID: <CAHGQGwEmMBBAE0RG-R3_LacfT4fbB55qGE6n9O5mNwrqvbNBtw@mail.gmail.com> (raw)
In-Reply-To: <CAOzEurQiP3uebd1GMiC1Dzf5VJwF4ZBEpJ6QYQFE6Y+rVjxqNA@mail.gmail.com>
References: <CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com>
<CAHGQGwE=kyQ+YnGPn8zpZ959+3ywg8OR_Nu__uXxxuE0E+Y_Zg@mail.gmail.com>
<CAOzEurRGiGE2Dfe+ySpb=+93ku=7ZC6RgAbHtLC6Xsq3g2XexA@mail.gmail.com>
<CAHGQGwH2h_R7FWPvEs3+NWLwHZoj9r96tUyRKi5haqxMc6FXiQ@mail.gmail.com>
<CAOzEurQiP3uebd1GMiC1Dzf5VJwF4ZBEpJ6QYQFE6Y+rVjxqNA@mail.gmail.com>
On Tue, Mar 10, 2026 at 10:02 AM Shinya Kato <[email protected]> wrote:
>
> On Mon, Mar 9, 2026 at 8:21 PM Fujii Masao <[email protected]> wrote:
> > > The attached v2 patch takes a different approach: it additionally
> > > requires that all reported positions (write/flush/apply) remain
> > > unchanged from the previous reply. This directly detects a truly idle
> > > system without relying on timeouts—if any position has advanced, new
> > > WAL activity must have occurred, so we should not clear the lag values
> > > even if the lag tracker is empty.
> >
> > This approach looks good to me.
>
> Thank you for looking into this.
>
> > One comment: currently, the lag becomes NULL basically after about one
> > wal_receiver_status_interval during periods of no activity. OTOH, with this
> > approach, it seems it would take about twice wal_receiver_status_interval.
> > Is this understanding correct?
>
> Exactly. With this patch, it takes about two
> wal_receiver_status_interval cycles to show NULL instead of one. I
> think this is an acceptable trade-off because it is better to take a
> bit longer to detect inactivity than to incorrectly show NULL during
> active replication.
Even with your latest patch, if we remove fullyAppliedLastTime, and set
clearLagTimes to true when applyPtr == sentPtr && noLagSamples &&
positionsUnchanged,
wouldn't the time for the lag to become NULL be almost the same as
wal_receiver_status_interval?
The documentation doesn't clearly specify how long it should take for
the lag to become NULL, so doubling that time might be acceptable.
However, if we can keep it roughly the same without much complexity,
I think that would be preferable.
Thought?
--
Fujii Masao
view thread (21+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
In-Reply-To: <CAHGQGwEmMBBAE0RG-R3_LacfT4fbB55qGE6n9O5mNwrqvbNBtw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox