Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wB8Xm-000ltf-1D for pgsql-hackers@arkaria.postgresql.org; Fri, 10 Apr 2026 09:52:54 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wB8Xj-00B1eM-1I for pgsql-hackers@arkaria.postgresql.org; Fri, 10 Apr 2026 09:52:52 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wB8Xi-00B1eE-3C for pgsql-hackers@lists.postgresql.org; Fri, 10 Apr 2026 09:52:52 +0000 Received: from mail-lj1-x236.google.com ([2a00:1450:4864:20::236]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wB8Xh-00000000JPO-2piF for pgsql-hackers@lists.postgresql.org; Fri, 10 Apr 2026 09:52:51 +0000 Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-3870778358aso16542931fa.1 for ; Fri, 10 Apr 2026 02:52:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775814767; cv=none; d=google.com; s=arc-20240605; b=Q5MqDEk9i2gulO4QDPOOGqhou7lHFLCw04MCWda3HDT3nZR3Y/BT01f7ZbntOGZ4LK xOypOz/PzbHzowZrWSRDz1AsoGbTUoV7lOJMo4I40pQ3vuvqQF7WyBpMuevIH7ojv0wS msjmQAYDclz7Y7L5v0Hcu7wDkNGxFQe6vyRKKTWXe0z5TUR60suD4MsWDPVtw0w+UBdt kUn5NILd1uDuDWpbSdVycz+9pllmgtpD9g5fXevKEUv+PA5awpWF9OFh+HBjdGUNiP9y dIcmZvO0om2fChw96NuHKMF1X92FtpaIfZCU93Fv6O5xdMOHPF2Js+LNzzPc6AIxFDD7 M7Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=4cUWVWGOidqkmVQFxKjEcXSyh7Av5UK5VTJH2BSDROI=; fh=bOOp6KkKdeil1FtxUEkBvnaQ01D6OhBOqAYLFvsP7Vo=; b=ScQWBayWGjU/GpBWI2M5vTZR/JPngGDFtESBpKa8SfhbkLrtc6b1jAMSt49YX6tW1Y k+WJiN1pG4pcAtiBEzctZ1I3aHz2BTxv9dtOA12+tHnkbitOhxQjl2D/d4r9ai87FA9I TwLvRegySCsR1k1Sf1rlAbgBg+/YgrR/2D4bh3iZ3GuIPlsRxWbfrnpdYIj7X1ZvJMw+ ERQm/0DCrQIjeAU5+VD4h5gvKPDZiKP6gwnQGAhGtPoAbIO0QHQDfKy7FKq75otW41NX w3JupLx9h6tDm30jyzk7F5LMQMlQmmotDI5mCT5kIHayc4BrKaMPDa6jR/SOxAuBZIXB jk3w==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775814767; x=1776419567; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4cUWVWGOidqkmVQFxKjEcXSyh7Av5UK5VTJH2BSDROI=; b=jyARLrWaE8YBrbCHpe9RHTfM1APx1bSTbMfwkHyE1X7/8sTERVBf4vzN+0XA2pdteI E0SAkBvioBLeB/qVSwgDF8OYEWuqbrtmmquKZnTSqMUzJ/4qAJ0ULga3Szx7J2THoQoe 1jJWbKVpMKx57+UVhPNLmqA9aaOCOctltBcR6PVxlBeBrvQAsTagkmpdvk/FIY78Dvdj qiU4fXFH/T2ayje/kfBqKM+ljwl9Bi/dlTo7y1xeCPJY5UOSJVu6C0ZJNOC5Ew/pJvnC usqwXD9w+FyauXftOkgT2EurvDgx1FP5+w8INjebHQ4NTaLHwix1UYLFIMsK+oJ4ZjcZ lyxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775814767; x=1776419567; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=4cUWVWGOidqkmVQFxKjEcXSyh7Av5UK5VTJH2BSDROI=; b=F5b/Yj+L2I/pwvzBzStikAkbLrMCNPc5XpDh9leMkbjQvqrKffaXHQlB+iShS+pN1g Xvl7BeqLdHCgdDE8vcCASxUj+IWUaEGZLzoExfXMur7Pw3kZcn5w+2awvPJBivS+xDHE moYMKvT+lxox+e5bBIU0EmuBu6uFgzbLe33+AhfXKAVDePtoDPruJfEp04wtk8kY4DUO 9BTFNUUxlQFNEgo+oGikoS6e5uLRfrvIgTMUtsEctgjvoP8p7GG/O2qDn773002UJhNz qIylXI4TLbpKScIteU/XPDd6LY+Yla+BKQfFsD1TMRdOdbDshCWrsiuO6LeE2q+gabf6 YzlQ== X-Gm-Message-State: AOJu0YwGcoSslxZkvcH3rgCP0yUH2F+RQ0CxU+UTTCYA5MSzMG1VomCl 0VOR80stFAeGc224/2oh+K3cbtr6TGqHjcNt/XRHvvbtKRDWn8131zB1QrumLR+1WpJIX0+h3xE jHa6BSZK6V+/0rCUv58HyH5jbknBYBS0= X-Gm-Gg: AeBDiev5AxboFmPO0hBtRVkN4cDIk3iUynsZ6W6Mq+fQCKXEtbPPSfjBX51wzVjKrlg YCw+LFuPtghHWO07LYvbwLI+eTvyVVgljU9dIyA31wYqWfvgzitkHdbFcImtgui/FE5Z477RmYH 2Y5Fu8d5BicN/ok1SOKCT2ku5i9TRPcBeXgzoFNAthLRmf7qwBcq/OtMOeecdA41P7SJ/d7od4E JRw5j9GNpwyYAm03ovjBVEMdHicTA5eBmhyWy3QI1A60IvqVqWqbP7QxRyqwvigHkZP92DZTcrU rJZvNy4zvZ7ep/dLLiJj7Wz4zBRmq43kzfM9VR5+ X-Received: by 2002:a05:651c:e0a:b0:38b:f632:e0fc with SMTP id 38308e7fff4ca-38e4bbf4c3fmr7877271fa.1.1775814766468; Fri, 10 Apr 2026 02:52:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Amit Kapila Date: Fri, 10 Apr 2026 15:22:34 +0530 X-Gm-Features: AQROBzBKF2aw8cxftStvHVaQokNIXWVtiDWaGhEP2YI4s_ZBqkVhcfc57o90jGE Message-ID: Subject: Re: Fix slotsync worker busy loop causing repeated log messages To: "Zhijie Hou (Fujitsu)" Cc: PostgreSQL Hackers , Fujii Masao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Apr 10, 2026 at 8:28=E2=80=AFAM Zhijie Hou (Fujitsu) wrote: > > On Sunday, March 22, 2026 1:47 AM Amit Kapila w= rote: > > On Thu, Mar 19, 2026 at 7:08=E2=80=AFPM Fujii Masao > > wrote: > > > > > > On Sat, Feb 28, 2026 at 2:03=E2=80=AFPM Amit Kapila > > wrote: > > > > > > > > On Fri, Feb 27, 2026 at 8:34=E2=80=AFPM Fujii Masao > > wrote: > > > > > > > > > > Normally, the slotsync worker updates the standby slot using the > > > > > primary's slot state. However, when confirmed_flush_lsn matches > > > > > but restart_lsn does not, the worker does not actually update the > > > > > standby slot. Despite that, the current code of > > > > > update_local_synced_slot() appears to treat this situation as if > > > > > an update occurred. As a result, the worker sleeps only for the > > > > > minimum interval (200 ms) before retrying. In the next cycle, it > > > > > again assumes an update happened, and continues looping with the > > > > > short sleep interval, causing the repeated logical decoding log m= essages. > > Based on a quick analysis, this seems to be the root cause. > > > > > > > > > > I think update_local_synced_slot() should return false (i.e., no > > > > > update > > > > > happened) when confirmed_flush_lsn is equal but restart_lsn > > > > > differs between primary and standby. > > > > > > > > > > > > > We expect that in such a case update_local_synced_slot() should > > > > advance local_slot's 'restart_lsn' via > > > > LogicalSlotAdvanceAndCheckSnapState(), otherwise, it won't go in th= e > > > > cheap code path next time. Normally, restart_lsn advancement should > > > > happen when we process XLOG_RUNNING_XACTS and call > > > > SnapBuildProcessRunningXacts(). In this particular case as both > > > > restart_lsn and confirmed_flush_lsn are the same (0/03000140), the > > > > machinery may not be processing XLOG_RUNNING_XACTS record. I have > > > > not debugged the exact case yet but you can try by emitting some > > > > more records on publisher, it should let the standby advance the > > > > slot. It is possible that we can do something like you are proposin= g > > > > to silence the LOG messages but we should know what is going on her= e. > > > > > > I agree it's important to investigate why restart_lsn doesn't advance > > > in that case and fix that issue. > > > > > > Separately, I think mistreating the slot as updated is another proble= m: > > > the local slot is treated as updated even when it isn't, which causes > > > the slotsync worker to use the minimum sleep interval. We should > > > address this independently. > > > > > > LogicalSlotAdvanceAndCheckSnapState() doesn't always update the local > > > slot, but update_local_synced_slot() currently assumes it does. That > > > seems to be the root cause. The attached 0001 patch makes > > > LogicalSlotAdvanceAndCheckSnapState() return whether the slot was > > > actually updated, and updates update_local_synced_slot() to act based= on > > that result. > > > > > > I've also attached 0002, which reduces the log level of some logical > > > decoding messages (e.g., "logical decoding found consistent point") > > > from LOG to DEBUG1. These are low-level, developer-oriented messages > > > and can be noisy when they occur regularly. This is the same issue I > > reported earlier at [1]. > > > > > > How about applying these patches while we continue working on the > > > restart_lsn issue? > > > > > > > * > > @@ -2174,7 +2193,10 @@ > > LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr moveto, > > > > if (XLogRecPtrIsValid(ctx->reader->EndRecPtr)) > > { > > - LogicalConfirmReceivedLocation(moveto); > > + bool slot_updated =3D LogicalConfirmReceivedLocation(moveto); > > + > > + if (updated_xmin_or_lsn) > > + *updated_xmin_or_lsn =3D slot_updated; > > > > BTW, LogicalSlotAdvanceAndCheckSnapState() could also advance slot whil= e > > processing running_xact record, so not sure we can rely only on the exp= licit > > call LogicalConfirmReceivedLocation() above to ascertain the same. > > Perhaps we could simply compare the slot's old and new LSN/xmin to determ= ine > whether updated_xmin_or_lsn needs to be set. > The patch works for me. Can we change the comment to as follows: "It is possible that the slot's xmin or LSNs are not updated, when the synced slot has reached consistent snapshot state or cannot build the one at all."? --=20 With Regards, Amit Kapila.