Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0s4m-002HT5-03 for pgsql-hackers@arkaria.postgresql.org; Fri, 13 Mar 2026 02:16:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0s4k-001JGy-0t for pgsql-hackers@arkaria.postgresql.org; Fri, 13 Mar 2026 02:16:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0s4j-001JGq-2d for pgsql-hackers@lists.postgresql.org; Fri, 13 Mar 2026 02:16:30 +0000 Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0s4h-00000001rlb-3QNT for pgsql-hackers@lists.postgresql.org; Fri, 13 Mar 2026 02:16:29 +0000 Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-2a871daa98fso13135255ad.1 for ; Thu, 12 Mar 2026 19:16:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773368187; x=1773972987; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=X1RWUgiQi8qmUOz48+c/cJTlOPj67rMuMmIfK2Q484o=; b=kuxsl0OtW8PgAtPPnz9itIXElhEeU7KxvjBmc1Nyjelz3WpJxVA+Y+Th2S1IEiJ2DV CV4uHpQGjmCcdi5LbL0EDlsOVPVg1W6qI7bPlSUTQCiA5MnWMxNq84S9loWKtYERcAbv It49TB+e5nfhBfmqskiMraQIPy4Jp0S4ZiU6ugBEbBWAaIvsxLQz1HEv7ckUxTNjYOQj gIr9d9HCd/wRXL2xhpG8T7tCbM7/MXHVcLslCU5P9wR4FFz/QKtpSlgj2oHfCKwqRhvD Jad94Cl3/kp08wKKlxdNTExAzsMc+pfX9IAGf+NSKm/Y5VNn8GRdMv3Wa0uu35IGg4b8 agaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773368187; x=1773972987; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=X1RWUgiQi8qmUOz48+c/cJTlOPj67rMuMmIfK2Q484o=; b=ABTTQeaWFmk9bkFTpYZqvb4E6kjTZHvCIKLPGOD84zNidX/lLB8b1sBS4AGdmL9yzo YAlGrm30CNJxdxSa8WLqBVv0xFym27wgCcRyhIqyMtQ7g1Nwm6Zfsn3CywYwJsoQECqH qSvSQovLWeIUo0krOHsqXhKMy9LnDO0mPGiG7vP7419G5WT2DmA5u+K7B6cmqJoEA+oJ Xwas1h57N8j0i2kDXpekdXN+oKewxoNpe+Z5OT3GV4YOQhQy8H4Dqxcwnhvz40kO1FJs FUPVTbyEgdRIiAt795WfVakrR5qeVjsShj/PZHK5sKaDLNmDUb4/pdO1mLYtPxy+0BNq iraA== X-Forwarded-Encrypted: i=1; AJvYcCUq73o4bNGrSNjsLLwtrdMfIFJO5mTd5GJS8c5UvxHaBfDcfpD7me6ADr5tXdMX238P6dJ/mmSync9k2Mqc@lists.postgresql.org X-Gm-Message-State: AOJu0YyPz4HL5E85XKfxv07B77+qNnjKfHvKN9NGB95goPoNa82dy+p8 pEyb8D5XpB6wnRbSu/UkYNBxhe60O87qJGgiqr3n8hqFdhuuW8AO9B8p X-Gm-Gg: ATEYQzwtIc4MzL42Ca0fQYs8PmxPbzPdMVoDhnJDqeLxBxtz2VQtB9SijhxIM4YdlKk BhtOcIOEnTzCWj/LPamYKBQBJLIb2hkaS/ZRGF3DSj63H8KXuVPNxVUmBBCepzA6pFwIQvoqL7r 727GEoTAaA1VJnhsCRANcUv96K2bEybHiPEOV4a0farMXQi1HIyJbj77sCTY9A79SmRqHr0BWEb RXF4Ln56XCwwLSjf0Jkiji3PSa5TzHJKWQ6YL65A0V5y7jfS2Lq+M3ZcRRTqF8aDoFQiZAXehnK V7eea7SWIBpTK9NVtVylWQ74Qsc9jLNPKZN6h8bxG3bECYfxontYsNsUw7F0e6xEF8ZhsEKjRHa ErxiHqlR60RcpvSkoID9dcs7hyHjEnLcwjvfVDR8CQZ/4btx7dMzAVkZ4irCN4wUUAd0pUPv4i4 1HncGkmZnVEx6btxCovNXbqIx3dEFlEhNbdntSecgFDg== X-Received: by 2002:a17:902:d54b:b0:2ae:5671:7067 with SMTP id d9443c01a7336-2aecaa7f8admr14854555ad.23.1773368186801; Thu, 12 Mar 2026 19:16:26 -0700 (PDT) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2aece6198dfsm4168865ad.29.2026.03.12.19.16.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Mar 2026 19:16:26 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication From: Chao Li In-Reply-To: Date: Fri, 13 Mar 2026 10:15:46 +0800 Cc: Shinya Kato , PostgreSQL Hackers Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Fujii Masao X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Mar 12, 2026, at 23:27, Fujii Masao wrote: >=20 > On Wed, Mar 11, 2026 at 11:39=E2=80=AFAM Shinya Kato = wrote: >>=20 >> On Tue, Mar 10, 2026 at 10:54=E2=80=AFAM Fujii Masao = wrote: >>> Even with your latest patch, if we remove fullyAppliedLastTime, and = set >>> clearLagTimes to true when applyPtr =3D=3D sentPtr && noLagSamples = && >>> positionsUnchanged, >>> wouldn't the time for the lag to become NULL be almost the same as >>> wal_receiver_status_interval? >>>=20 >>> The documentation doesn't clearly specify how long it should take = for >>> the lag to become NULL, so doubling that time might be acceptable. >>> However, if we can keep it roughly the same without much complexity, >>> I think that would be preferable. >>>=20 >>> Thought? >>=20 >> Thank you for the suggestion. I tested this by removing >> fullyAppliedLastTime, but even with synchronous replication, NULL >> still appears. Here is why: >>=20 >> - Reply 1 (flush notification): positions =3D X. Lag samples are >> consumed with real values, so noLagSamples =3D false. clearLagTimes = is >> not set, and prevPtrs =3D X is saved. >>=20 >> - Reply 2 (force_reply): positions =3D X again. Here, noLagSamples =3D >> true and positionsUnchanged =3D true. Since applyPtr =3D=3D sentPtr, >> clearLagTimes is set to true, resulting in a NULL value. >>=20 >> Therefore, I believe fullyAppliedLastTime is still necessary to = ensure >> that the previous reply also contained no lag samples. >=20 > Thanks for testing and for the clarification! You're right. >=20 > However, if we apply this change, the time required for the lag = information to > be reset would effectively double. I start wondering if that's really > acceptable, especially for back branches. Although the docs doesn't = clearly > specify this timing, doubling it could affect systems that monitor > replication lag, for example. It might still be reasonable to apply > such a change in master, though. >=20 > On further thought, the root cause seems to be that walreceiver can = send > two consecutive status reply messages with identical WAL locations = even > when wal_receiver_status_interval has not yet elapsed. Addressing that > behavior directly might resolve the issue you reported. I've attached = a PoC > patch that does this. Thought? >=20 > Regards, >=20 > --=20 > Fujii Masao > I just read v4. The approach looks good to me overall. I have a few = comments about the naming. This patch changes the old force reply logic to an = applied-location-driven reply. Now a reply is sent only if the applied = location has advanced. However, this applied-location-driven reply is = still triggered from WalRcvForceReply(), so the function has effectively = lost its original =E2=80=9Cforce=E2=80=9D semantics. Because of that, it = might be better to rename WalRcvForceReply() to something like = WalRcvRequestApplyStatusUpdate(). Then, ``` static void XLogWalRcvSendReply(bool force, bool requestReply, bool replyApply) ``` replyApply reads like =E2=80=9Csend an apply reply=E2=80=9D, but in = reality it indicates that the applied location should be checked to = decide whether to send the reply. So it might be clearer to rename it to = something like checkApplyStatus. Lastly, ``` sig_atomic_t reply_apply; /* used as a bool */ ``` reply_apply sounds like an action of =E2=80=9Creply with apply=E2=80=9D, = but what it actually represents is that the startup process requested an = applied-location-driven reply. If applied location is not advanced, the = reply won=E2=80=99t be sent. So a name like apply_update_requested might = better reflect the meaning. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/