Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w40PW-001ltY-0P for pgsql-hackers@arkaria.postgresql.org; Sat, 21 Mar 2026 17:46:54 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w40PT-00B6KU-1A for pgsql-hackers@arkaria.postgresql.org; Sat, 21 Mar 2026 17:46:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w40PS-00B6KJ-34 for pgsql-hackers@lists.postgresql.org; Sat, 21 Mar 2026 17:46:51 +0000 Received: from mail-lf1-x133.google.com ([2a00:1450:4864:20::133]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w40PR-00000000M56-1tCW for pgsql-hackers@lists.postgresql.org; Sat, 21 Mar 2026 17:46:50 +0000 Received: by mail-lf1-x133.google.com with SMTP id 2adb3069b0e04-5a27b5ad832so2760744e87.2 for ; Sat, 21 Mar 2026 10:46:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774115208; cv=none; d=google.com; s=arc-20240605; b=T7wnJQF0Qz9ITP9O2yxSrwZjgatqeao42LpQT//K1ucqlurBTukYWdM6CjnijH0dGW 9umW0Cv1McfMRD53o6uY52awAU57G0HIAevUB/Q+Is0t+BEpmCa4tR5k2yMiGnjb2jqa iOBmoGUpmBBRBGB4SU5Cm1CgWXel7xEjVGeJ/K30XUnExA9Ycc43kv8HZKj2+dxuZr/Y jt3gGQY9mR2toduNLwiJmc86xY7+6MAAk8FjeDMgqWHhph4OG3oEegKF5UKZW3VMR4Nz JuHXm8d/4j4OlnTIHcRQtf0yleS9PAAIGavHNOEx53W6VCiR/xOUGsB03Kt4zD2d5YXY Wa9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=oFdqsKrJGUwUfk3VYsQJl1ASr6g3j9gZdCIMcwUTCiY=; fh=x4xgi4tdDukVB0fBuAIe3vORvD36rbbTBlhrrGN8QbM=; b=aGOa602x4cgr3JKE3dgSIA9JyMQ3AwDvTRNi5z4YkwfYX5+3Jpibs+VtTTbmtZlwOw nC2nfDMp4JZeCvdrzUOOFscmOV1caagTNp+g1bxJp8n7NjdQ79ivp8Xel2mkiy7tWctO pmjHsudcAouEUIUIaEUZceP5ejLAp4sPRbBDTsGY2NKMZjhyVnN1DtzaUk/zjZUKw7wf asJt8TdzsYmbV/S436ELaPhEAPGuLlM2mcda/Nr4nHRiXSNzwa8qsUZa32WUIHoJSMZi wHwTa3EM84uhnQJW5EBtrgLwqhfKt5DHKyXwS1hirph0XtQ4Ujy1t+pbjqpdhCIOdKxf p2aA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774115208; x=1774720008; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oFdqsKrJGUwUfk3VYsQJl1ASr6g3j9gZdCIMcwUTCiY=; b=m14wSqXUFkPNyXpNQWhyBtGlpCUNsgNCb3DQgjy+vrWXXIRyuNFnx0BPf7a4XAJiQt OtXNI1LQc/XqAE251dU8eGgBlFi2OYoZ8STDN+qKUypajAgkGMrAJEbFK/mhKU6c40dK DvKUhdrd58NAPnRq0HQnUCBtlzXaNR8SfQw0kZu4tSFvrpHWoDAk0vQrKjZ6cAN/Rgdg tVLSLuWEvU23f0O8AblvwnbQ9NwCRl42ghkGmt6BGO8QfpfxrXiPbsz0IllCL+OBOeVF OCV4xEpnzFqsCbp185csCr3pknyOtpzcn//GUwAiQ+AQ6rKb/n64BrzluYSJ/1Jx0zbp fcmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774115208; x=1774720008; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=oFdqsKrJGUwUfk3VYsQJl1ASr6g3j9gZdCIMcwUTCiY=; b=q5g3qmaEveqsP04IMv1WRTYBKDBCD/2lfVKSCpJBpiOmlzQVL0rq/6aaZxagfaEy7G jLzq7u4h0kZXi7fS4Uc2ravkuoQaDa+6oTPnQV8W8P/L9xbrNFYAe3vvwt+4YNSI/cjh d01+bAMvV9zzcUmiB+t87ylbBB4RxFOI4IXifsuusc7H7pWMf2IKAWIv7PeAtub1UbxG 6itXhk6zYaTtSdURuhVSmdkiqKVu/rTrfV5C9q+YNvjQ09ll2r8MG/f30SwxqNItwfPf 9+ATmliSW+vKEekUovlF5DYFqmM1yehlH7XOuKDsw27BdgbRzo4Ozk2cr1NaCWdiI/B1 Ht3Q== X-Gm-Message-State: AOJu0Yyrat20L7lLAPPjJB30GN5Dk8J4GgGmBHwoyFngyl9+LShDHkHZ MAnZLgh5D6PqZeKRJTFp44NFgBWGdJUcaTvE0VIOW0qJerYEaGsagQXnzTJF5ApVy5SfntkdEqe qO+1HcaoMd0mPfzDefqWhmjmL9+TKMBQ= X-Gm-Gg: ATEYQzyi/zgVIFlIBS1Wh9LP2HKToTyby4E3InK1FJnAXDPP9fuqaLeHtt0I6RcQxS9 kj7jKzRCSfLFXDsmSaJsjJz9O0fK6it9KwA0VP3U/V+sP0d9Wts1HygcaaHjjw9864b7g7nFgRP sl7Wr0GxMJFyuXaOOUuynuxwRnOX6FHJFDzgs9wMUGaZUJrDkijA+pPPQJuitVeZocL8NvWc/AC +vIyZeM4pR3vwISKN9Oh3EiqW5lPluodrSekDWR63lwgnmoABWM3nG7Nr2A2I6yEy1OxvCJOvN2 oJSxeKCdVDKPKu+bLiS2j5gQzdvCJl2x+rOUGJ4zUdM9rhiW6ZeaRSFfC531SJP5wInnfVBCZw= = X-Received: by 2002:a05:6512:308d:b0:5a2:8276:ea3 with SMTP id 2adb3069b0e04-5a285b5d403mr2580888e87.40.1774115207422; Sat, 21 Mar 2026 10:46:47 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Amit Kapila Date: Sat, 21 Mar 2026 23:16:36 +0530 X-Gm-Features: AaiRm53J4VZ5O4xXyVUvhNJ9KKRYRbMAnpG2bX0ELUyCYL8WuuNvyfrkByq16O4 Message-ID: Subject: Re: Fix slotsync worker busy loop causing repeated log messages To: Fujii Masao Cc: PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Mar 19, 2026 at 7:08=E2=80=AFPM Fujii Masao = wrote: > > On Sat, Feb 28, 2026 at 2:03=E2=80=AFPM Amit Kapila wrote: > > > > On Fri, Feb 27, 2026 at 8:34=E2=80=AFPM Fujii Masao wrote: > > > > > > Normally, the slotsync worker updates the standby slot using the prim= ary's slot > > > state. However, when confirmed_flush_lsn matches but restart_lsn does= not, > > > the worker does not actually update the standby slot. Despite that, t= he current > > > code of update_local_synced_slot() appears to treat this situation as= if > > > an update occurred. As a result, the worker sleeps only for the minim= um > > > interval (200 ms) before retrying. In the next cycle, it again assume= s > > > an update happened, and continues looping with the short sleep interv= al, > > > causing the repeated logical decoding log messages. Based on a quick = analysis, > > > this seems to be the root cause. > > > > > > I think update_local_synced_slot() should return false (i.e., no upda= te > > > happened) when confirmed_flush_lsn is equal but restart_lsn differs b= etween > > > primary and standby. > > > > > > > We expect that in such a case update_local_synced_slot() should > > advance local_slot's 'restart_lsn' via > > LogicalSlotAdvanceAndCheckSnapState(), otherwise, it won't go in the > > cheap code path next time. Normally, restart_lsn advancement should > > happen when we process XLOG_RUNNING_XACTS and call > > SnapBuildProcessRunningXacts(). In this particular case as both > > restart_lsn and confirmed_flush_lsn are the same (0/03000140), the > > machinery may not be processing XLOG_RUNNING_XACTS record. I have not > > debugged the exact case yet but you can try by emitting some more > > records on publisher, it should let the standby advance the slot. It > > is possible that we can do something like you are proposing to silence > > the LOG messages but we should know what is going on here. > > I agree it's important to investigate why restart_lsn doesn't advance > in that case and fix that issue. > > Separately, I think mistreating the slot as updated is another problem: > the local slot is treated as updated even when it isn't, which causes > the slotsync worker to use the minimum sleep interval. We should > address this independently. > > LogicalSlotAdvanceAndCheckSnapState() doesn't always update the local slo= t, > but update_local_synced_slot() currently assumes it does. That seems to b= e > the root cause. The attached 0001 patch makes > LogicalSlotAdvanceAndCheckSnapState() return whether the slot was actuall= y > updated, and updates update_local_synced_slot() to act based on that resu= lt. > > I've also attached 0002, which reduces the log level of some logical deco= ding > messages (e.g., "logical decoding found consistent point") from LOG to > DEBUG1. These are low-level, developer-oriented messages and can be noisy > when they occur regularly. This is the same issue I reported earlier at [= 1]. > > How about applying these patches while we continue working on > the restart_lsn issue? > * @@ -2174,7 +2193,10 @@ LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr movet= o, if (XLogRecPtrIsValid(ctx->reader->EndRecPtr)) { - LogicalConfirmReceivedLocation(moveto); + bool slot_updated =3D LogicalConfirmReceivedLocation(moveto); + + if (updated_xmin_or_lsn) + *updated_xmin_or_lsn =3D slot_updated; BTW, LogicalSlotAdvanceAndCheckSnapState() could also advance slot while processing running_xact record, so not sure we can rely only on the explicit call LogicalConfirmReceivedLocation() above to ascertain the same. Sorry, I couldn't get the chance to look at the patches proposed by Hou-san to fix this issue but I'll look at it after the feature freeze. * For 0002, I think the current log_level is used since logical decoding is introduced, so if we want to change those, we should start a separate thread. BTW, as a developer, I find those LOG level messages quite useful while debugging issues, so I am in favor of keeping those as it is. However, if the consensus is otherwise, we can change them as well. --=20 With Regards, Amit Kapila.