Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxMwi-00AdZ7-2F for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Mar 2026 10:25:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vxMwg-006MFr-0F for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Mar 2026 10:25:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxMwf-006MFj-2W for pgsql-hackers@lists.postgresql.org; Tue, 03 Mar 2026 10:25:42 +0000 Received: from mail-lj1-x22a.google.com ([2a00:1450:4864:20::22a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vxMwe-00000000AEU-0E03 for pgsql-hackers@lists.postgresql.org; Tue, 03 Mar 2026 10:25:42 +0000 Received: by mail-lj1-x22a.google.com with SMTP id 38308e7fff4ca-38a23cf08e0so1510521fa.3 for ; Tue, 03 Mar 2026 02:25:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772533539; cv=none; d=google.com; s=arc-20240605; b=Zr5z1TQ5Wy0qEykPf0y2FSbwc4Xm4R1ft2gagmSYvDCBVOHaaZnUB5+F5uKjh2f+Vo UFs/c0nrxAHvsjgTzgQ/zOO+Ua1Jq3Hg/RWY2Ha8xpCB0Jrj8mtuv+90fWMXzQIlZg15 KXaKy2qzFA1JRJWif1Cl4xpmwLgQd/SIbdFN3XzxmFcRr/Br57dmXaVoADSgePkjo8ly lLULG784HHpn8nmpAR965mg64/A16hcTH7bBjOnaDbHcZJwTvyi3239MvysSMbQyiKix SGr6ccYThvSn8oyeM4RZScNVqJbLYT7faZ5FKzpwga1ChDA1eg60V68T0w/MEarK6Cs+ egtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=hJGvH/FTlXTiDrbiDq6G0HZCfR3Bg3UIIMoT50IuWek=; fh=0jzuji/YyU0oTe8EnQLAvtGwkngBqdkKfX+hCRfUEUk=; b=QkMo4Nk+Au08+TwxW1/CHwgOypM64xrYIJVFa+tBMVWgRbBe0swoLJpeXNwDYuBLr4 L5WlR696Bcb2k1GKv5Y67ZdOfO+9mGKfAPq8pb9yCs7/lPGCQo50IYzGhQNJfxUIzisx JuLyo9evZtDZuXwCuH2n0Un4pcEpZ74fcZLzChQ0GpZA4FcQlKDzYfs4fq9ipLOzqABd TSK95ZJndjr7jTtvydOzccaEJEKd2yuf6FwEZlxTuZUVkT2g+Ucj7L3PYFH6ZcYVHVs7 tGWwPvn+1kebOPKlOj4ORAjy5Ly4eN2ijf+VrTPsJYKuGqBZjVbTavNR+PPLK9nCQt7q JKog==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772533539; x=1773138339; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hJGvH/FTlXTiDrbiDq6G0HZCfR3Bg3UIIMoT50IuWek=; b=XklBVlN8koekkmjocpiGWTpaSma23xrtrKRMH3jvbnpqYsl05y0Pn/zOdglDtOCqjI EUxxp2jwPpOjSnGKK9GN9Ahx3il7DzaixmzE8iNOPWEC8oQdWWlyHG53a8TdBoEisZRS OJ2O7sB+PwkcFVENpo3XVdP3OpoSL45RSUWw+os6v95VhQ/KIkE0K1ShdnB/EAADtrV0 /mpcweje34kqreuB8KOK8a9GT6RAGhqvkDCqmoSFBXp/KM6Z6wTxiZAwWCajDTjgFgcI zF6eBrmhd+yMX+pqXxGKtPiziX4vZSLHvrWQJcw1d+boqZSOef5OKBE+9xyIJKN3xkRz /t3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772533539; x=1773138339; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=hJGvH/FTlXTiDrbiDq6G0HZCfR3Bg3UIIMoT50IuWek=; b=qzBZSY5WPKnSclZRtiR5CjyLO6epFoRS3i+SQDl8JZZABTve5aZAPs5I9biMpQzOmr j4sT9AYi2spq4Km2uzYUqo79uqa6xZWoaMYnJr3K/EMWbGM+wMv+QB5oNDyBvsTQiQ/C NCfKELJ/ToCQUZ2Irv2MXlIrvjFGcGVoaGBtT6ZWmjIK0v0qFnPOLGuDMv0382Glx3ez 3Zv+tcWudhBm4rcpg/NiZ01P/n72NBUuAwpPVZA1FNOfCBXZMmFIHCqZlHJLJcRzxR3Y Pg+SdupDsZHoKf4nIljsQQPpY4dR4p1t+sKKtm3DZcrPMbCl3ib/7y6PdAEoWCf1e2dj DfbA== X-Forwarded-Encrypted: i=1; AJvYcCV9gjYuYBJO4oUieOy/C8TUO8zGEC6IdVJodfRe7SxPBnLXQGBa1MTqi7L/nIb/Yb4KTfW0J6GI1owQb7QX@lists.postgresql.org X-Gm-Message-State: AOJu0YxTqjTfUSwqcSVF38t+eJhw+KbN1CbbgBdWpXkJTHs7BGAZyYur d/iCI1/bnfN/7PWcro1WftrrGRVTD/fRuGuNVcewnHDanGLV4icyfXdvwHjHyli7NoeRTylj9uF BAHmBP9gb9D9rJAO32TnEEeJXgvdwDV8= X-Gm-Gg: ATEYQzy9hkpr0DOSUq4sg79dQ1f57u6Hi31dmH6dcqAfXmx8ClTo6vQl9/waaVDUx1f uFPVUhVDQxA9LWa1tEZl2XR0wXWQtOP8yJpC+BM4SRSU/ZbIwtWgFmRuT8PGd2pBCUgLIUVaO6r 1ASL1YnffFlYsHU9FCwx4AJrhLf9raT5h60CxiZF6N1cuaxSO6W6+xxp8ynB3jz/RHs23bA25ve XlBYGwyxkb0LGyeSO/PBjeKSit6kN6oyPb07Pgqk6RYczNcDYxJDPO4mBGy6z0WJeAdZz6y7Zr4 KxQKF17XR084ppgF9DjXZ5SljZkePtqUOgSt8ERcDRh3i2SbTy8= X-Received: by 2002:a05:651c:882:b0:389:fbe6:3347 with SMTP id 38308e7fff4ca-389ff115a26mr102891021fa.1.1772533538556; Tue, 03 Mar 2026 02:25:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Amit Kapila Date: Tue, 3 Mar 2026 15:55:27 +0530 X-Gm-Features: AaiRm52jBqoLeZ0m8f-sO5SzsIfkIwW3FYVJQ0IoI4syOqkZRNhF3J5Bje1nTxc Message-ID: Subject: Re: Fix slotsync worker busy loop causing repeated log messages To: "Zhijie Hou (Fujitsu)" Cc: Fujii Masao , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Mar 3, 2026 at 1:12=E2=80=AFPM Zhijie Hou (Fujitsu) wrote: > > On Saturday, February 28, 2026 1:03 PM Amit Kapila wrote: > > On Fri, Feb 27, 2026 at 8:34=E2=80=AFPM Fujii Masao wrote: > > > > > > Normally, the slotsync worker updates the standby slot using the > > > primary's slot state. However, when confirmed_flush_lsn matches but > > > restart_lsn does not, the worker does not actually update the standby > > > slot. Despite that, the current code of update_local_synced_slot() > > > appears to treat this situation as if an update occurred. As a result= , > > > the worker sleeps only for the minimum interval (200 ms) before > > > retrying. In the next cycle, it again assumes an update happened, and > > > continues looping with the short sleep interval, causing the repeated > > > logical decoding log messages. Based on a quick analysis, this seems = to be > > the root cause. > > > > > > I think update_local_synced_slot() should return false (i.e., no > > > update > > > happened) when confirmed_flush_lsn is equal but restart_lsn differs > > > between primary and standby. > > > > > > > We expect that in such a case update_local_synced_slot() should advance > > local_slot's 'restart_lsn' via LogicalSlotAdvanceAndCheckSnapState(), > > otherwise, it won't go in the cheap code path next time. Normally, rest= art_lsn > > advancement should happen when we process XLOG_RUNNING_XACTS and > > call SnapBuildProcessRunningXacts(). In this particular case as both > > restart_lsn and confirmed_flush_lsn are the same (0/03000140), the > > machinery may not be processing XLOG_RUNNING_XACTS record. I have not > > debugged the exact case yet but you can try by emitting some more recor= ds > > on publisher, it should let the standby advance the slot. It is possibl= e that we > > can do something like you are proposing to silence the LOG messages but= we > > should know what is going on here. > > I reproduced and debugged this issue where a replication slot's restart_l= sn > fails to advance. In my environment, I found it only occurs when a synced > slot first builds a consistent snapshot. The problematic code path is in > SnapBuildProcessRunningXacts(): > > if (builder->state < SNAPBUILD_CONSISTENT) > { > /* returns false if there's no point in performing cleanup just y= et */ > if (!SnapBuildFindSnapshot(builder, lsn, running)) > return; > } > > When a synced slot reaches consistency for the first time with no running > transactions, SnapBuildFindSnapshot() returns false, causing the function= to > return without updating the candidate restart_lsn. > > So, an alternative approach is to improve this logic by updating the cand= idate > restart_lsn in this case instead of returning early. > But why not return 'true' from SnapBuildFindSnapshot() in that case? The comment atop SnapBuildFindSnapshot() says: "Returns true if there is a point in performing internal maintenance/cleanup using the xl_running_xacts record.". Doesn't updating restart_lsn fall under that category? However, I have a question that even if we haven't incremented it in the first cycle, why is it not incrementing restart_lsn in consecutive sync cycles. --=20 With Regards, Amit Kapila.