Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vx4VU-00Dnhy-07 for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Mar 2026 14:44:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vx4VS-001Ql3-0J for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Mar 2026 14:44:22 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vx4VR-001Qku-2b for pgsql-hackers@lists.postgresql.org; Mon, 02 Mar 2026 14:44:22 +0000 Received: from mail-oi1-x236.google.com ([2607:f8b0:4864:20::236]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vx4VQ-000000001tv-0038 for pgsql-hackers@lists.postgresql.org; Mon, 02 Mar 2026 14:44:22 +0000 Received: by mail-oi1-x236.google.com with SMTP id 5614622812f47-45f053b7b90so3012681b6e.0 for ; Mon, 02 Mar 2026 06:44:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772462657; cv=none; d=google.com; s=arc-20240605; b=hE/Wi4iaDkBtucVLP6SjQuGHA86G5Gdb+YhNCen0je0q43K/S+Z8nkmDUs+VEPwToT HvTjPW2iHVVDCqOn/BB+CpCOnBUcUBLc0OKPlXVMRgvXhC8odN13zbAZ56osz2Dq65En 3RjAZzlZvjjEK/yegvfO3Rzqc3Aq8pKHxbj/ixsS49ZbGJu4iPnsxYeW2GNwPNdDo2pp Q0HIiRRiSUhOMfgowOZACB2noqMGm/irB/R193cN2pWJbzovo+4ZU6I5P81rnfrP0x/y nWDQ2gF6do2IL3TzqjTmVWJ8i12lBIY746qmLBpCgdyHWvYBrjdKnZrN7/1TDkiikeNN OIYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wmlU20vS1nT6gNFvpjeUNGHZOGgn839wKXjqK7DVKbE=; fh=SprTjyyUNMFOaPK7zUpaC1EEovCYhXioP7Zs2WMpltw=; b=HOaaPEkSwG3rynhFbjjXmqKGkSR6buFMhQEzKBS6I8hkLWBxPUD7eV4KvVa9UZFpvG l+GRkD9KrBc5qBukCXd4C6pW+kNgz8eBf7UM0dYEtTZekjAqvEcKQB70bA2zui2y+rj6 4cz9RIWwtYDPHGZlMuWb/g4x4zAhzbw2IVh41VfZJ+ff4bO9k3DItFBt2lghXFpiWxok vw/q1XxfpQRwGUTZWJuqVOpY98QccKhT85aBBewIVbsNrSQAj7rAvSr4HL/b276OmmQZ OMWF1Fc/30TntmHY9WxYT0J01wxhK4tkmycsq6YNjBSBbfBSo0IJP3Y8GRTlq5m48ioZ rqsg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772462657; x=1773067457; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wmlU20vS1nT6gNFvpjeUNGHZOGgn839wKXjqK7DVKbE=; b=UtKDJb6AmFbOaexWFocIOUXvOIX+R4b6IOOWFrEEZQULwa8RFxeBRtsnZk4NkR/ZnX 2eD2IwXiRpUa1LQZs8i5JrZqjQpy/u14zplfWxxfcR1u3Xl62rctVQVNjd/FPOVwCdhP AiURuHVEVR0f9tXhOGSq7R0jo8EtLTeIytwRNcdwT/ELa2i2q7BI5m22wIkx8if5Wz/Q BlDCGtwR/JFokwM4ELNurmzlvwE0BNYVpC9yQHElJeCSTAQl1i+oEwdAKPVpDqa5dDVu mA2mN+hQdPtZ8g0BSfuHz1owEtRSeuzgqGn14BfNCvEbDl/DVdKBiFkZ/B5dag3eKLzq SnnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772462657; x=1773067457; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=wmlU20vS1nT6gNFvpjeUNGHZOGgn839wKXjqK7DVKbE=; b=vEPlbgFjbt/BdNSMXrHVD1VOu+SONCR7LtqaNsKUjH6meCxjIhJ8nshFebJyEWWMk4 BQbX9WQZcmcIHlhn1TFipeaW5JvhGKLXqoi6iD3uOOV9ZpfNzIxOxskLQjNs7xtW7+fT KIRmVWg4B1sflHQcv8BkSvezmH29v/3kwuHmlaP+JnUqfjZQUdT9PbeUodGwq/rIkhP/ oO8z7Nob4CavNNDSD77UkufMw+t7TS6363ztfQtk1uTcqc90vKvnD2OyJ4WcnAIPhNEO /BZt6RKSnzAdtpHy7bWoR0n6UWQqBFzDbGqLwmBmnmL03XaNIBoUBEpmKBiyc/ILCobJ B5Aw== X-Gm-Message-State: AOJu0Yyf//Dn1NZHctgwCPEnYTqC+zYmxJ38GOh7nW/oZy6ZRmS6qCKv VpmWXzekwSHHLytnakIzBTMRYDMtHqjSfAUpBtx/bVOEmvODPo3YWPcALYBYlY8ykfLa9AjQpo3 2TPNHVOYyngZgkLoc9w4ZDlTlZ1cP0+o= X-Gm-Gg: ATEYQzy7KM1iD+iPLWjKS1n26uRdH4dm20vFaObXODk/bl2kwDcY/CCyh+8u+oNI4bD +O2oXkwMt88I1C3914c5gJECYf8vJXO7k3XBCGfROPeZQ1W439aAA4+yr0aQfApxQL2LEpzM/0G +isVUGqbU2K3Bzm1I4vunAiwaYt6no9BbUsl+vhdHUSEz/cZN4G7X9ZUF11ZYGYEWkdt15jGC5D qFGuK22XyRS3/ToCu9/BFt00w2i77xVdDBIuB03pZnzXZNvOGHdvQChFG7jdOfdYojdzQ2irHit 5z/p1gY+IeBvBtZQyaQD/0gff0imZZQLXHlju/jzbMXJw3pFrylz X-Received: by 2002:a05:6820:1906:b0:662:f0cb:84bf with SMTP id 006d021491bc7-679faef920emr7646059eaf.36.1772462657087; Mon, 02 Mar 2026 06:44:17 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Fujii Masao Date: Mon, 2 Mar 2026 23:44:05 +0900 X-Gm-Features: AaiRm51fOMYNOM6B_KqbXwsa_YFinFFVZWyIbU5EqLr-Pbx_ayTwkm0aP0PctQs Message-ID: Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication To: Shinya Kato Cc: PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Feb 24, 2026 at 3:54=E2=80=AFPM Shinya Kato wrote: > > Hi hackers, > > I have noticed that pg_stat_replication.*_lag sometimes shows NULL > when inserting a record per second for health checking. This happens > when the startup process replays WAL fast enough before the > walreceiver sends its flush notification to the walsender. > > Here is the sequence that triggers the issue: (See normal.svg and > error.svg for diagrams of the normal and problematic cases.) > > 1. The walreceiver receives, writes, and flushes WAL, then wakes the > startup process via WakeupRecovery(). > > 2. The startup process replays all available WAL quickly, then calls > WalRcvForceReply() to set force_reply =3D true and wakes the > walreceiver. > > 3. The walreceiver sends a flush notification to the walsender > (XLogWalRcvSendReply() in XLogWalRcvFlush()). Since the startup has > already replayed the WAL by this point, this message reports the > incremented applyPtr, which equals sentPtr. The walsender processes > this message, consuming the LagTracker samples and setting > fullyAppliedLastTime =3D true. > > 4. In the next loop iteration, the walreceiver sees force_reply =3D true > and sends another reply with the same positions. The walsender sees > applyPtr =3D=3D sentPtr for the second consecutive time and sets > clearLagTimes =3D true. Since the LagTracker samples were already > consumed by step 3, all lag values are -1. With clearLagTimes =3D true, > these -1 values are written to walsnd->*Lag, causing > pg_stat_replication to show NULL. > > The comment in ProcessStandbyReplyMessage() says: > > * If the standby reports that it has fully replayed the WAL in two > * consecutive reply messages, then the second such message must resu= lt > * from wal_receiver_status_interval expiring on the standby. > > But as shown above, the second message can also come from > WalRcvForceReply(), violating this assumption. > > The attached patch fixes this by adding a check that all lag values > are -1 to the clearLagTimes condition. This ensures that clearLagTimes > only triggers when there are truly no new lag samples in two > consecutive messages (i.e., the system is genuinely idle), and not > when the samples were simply consumed by a preceding message in a > burst of replies. Thanks for the patch! With the patch applied, I set up a logical replication and inserted a row e= very second. Even with continuous inserts, NULL was shown in the lag columns of pg_stat_replication. That makes me wonder whether the patch's approach is sufficient to address the issue. Relying solely on replies from the standby or subscriber seems a bit fragil= e to me. If the goal is to keep showing the last measured lag for some time, perhaps we should introduce a rate limit on when NULL is displayed in the l= ag columns? For example, if there has been no activity (i.e., sentPtr =3D=3D applyPtr a= nd applyPtr has not changed since the previous cycle) for, say, 10 seconds, then we could allow NULL to be shown. Thought? Regards, --=20 Fujii Masao