Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wXEQK-0038mc-1m for pgsql-hackers@arkaria.postgresql.org; Wed, 10 Jun 2026 08:36:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wXEQJ-00A9Oj-1V for pgsql-hackers@arkaria.postgresql.org; Wed, 10 Jun 2026 08:36:31 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wXEQJ-00A9OR-0E for pgsql-hackers@lists.postgresql.org; Wed, 10 Jun 2026 08:36:31 +0000 Received: from mail-ed1-x530.google.com ([2a00:1450:4864:20::530]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wXEQH-00000001ypV-0rlK for pgsql-hackers@postgresql.org; Wed, 10 Jun 2026 08:36:30 +0000 Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-68c3421b009so10621354a12.1 for ; Wed, 10 Jun 2026 01:36:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1781080586; cv=none; d=google.com; s=arc-20240605; b=HszNEtTN4qExJpLnvN3FINfsMI6/cGm+AL9t+veOPXTU6ZpkThmoAqP87xaFOiSp5j VgyG7ecux/zxbCEjuq4hh7WNlRTRTs63SDdbrGJRVUR3fE9r+dj6ps9O7oQJYDFEyO6v FhJjWe3k4XZYcriXbHixPSSUnkXI222JSrfmQfFTQ0HmM/Koqg1P97otuPFmf/BytuWe Zo8fBlWBc6GJFjv2aGAMxqN0s0Eq5kxyaQ1ANQad9AowB1hK/Fh3BpXyHrknHmi7HfLs sqjxcPIHUy/pT7MhOSRDJ59fBtUNyg5tchOCzbHCN3ACyrRFyePK2prj4Oo6qUxWL/QK rheg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=OMKHwP4OSbFMwMe5VqDZx3z8ncU03+MzzLoGETaRtkc=; fh=MkolwoK92CcbJxdtc5uk5pZiL6X10T3uMsMsYkPI+K8=; b=g4uTGmwkSamDp9AWUbKmD5MrCEbH8CVVZmpSX8cDRPM27hgSveYbR1YTK+1a+1Yxyl O20eYYRAg1tjnVVBnHKSRBHJ4400YqRAKg9f2eOCvcp+59GKuoU9w42NheTGk8BmcsQu DXo9pLpHv6BKHtliVQVN3UQHyPdlU7jQyguwG6ZJ7rEOQO5sZN3aMXDFXh6N7ktUpoGh q5RBWJfkIKu8KOGA/UUGgEElMBzyknVwZaIdxf9y7ckiBuBKZo1ujnDJBF10wfj1f4H8 Q8TxaNRDanNhgCn8Sen3SVJsdu2zyClPemTdomAMYwCcARZmoDQ6+PwMWn53YMDlqKxu gKLw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781080586; x=1781685386; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OMKHwP4OSbFMwMe5VqDZx3z8ncU03+MzzLoGETaRtkc=; b=epsPlA3zII58D6+SF0S2IBRqWjaP6ATS15NwAAyXSvbN8XEKFmcCNshUf0UG4rLXfk kd6gjfHhiZYoY+tyUDNNvv1VNuDHP2p6uTkUblIQbYVylHlnPbOfd+Efnb47imlCBtdv S4UmuRnVE+9n3hRf3ty5ykqV37JGZJnvm998MRM/2eirUfyVA/G1hkVgLweMLYx58GF0 Lv8eqDNI9WgXGI038l4o2EnAWji3uLqGUfYYcokHZBklu/o58Ub9Mq51BgoeMF3fuQJo LlDHna1wAcv1DioYueC8gdCv6OsIjQPzSpMGfMBE0p2ZksfoYzRj75rkaT1h98XGu7Gn GsQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781080586; x=1781685386; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=OMKHwP4OSbFMwMe5VqDZx3z8ncU03+MzzLoGETaRtkc=; b=BpeB3KwXF/s3ywiUH38HXWP5gN03bTh9OWNxPZaKFRyDMcy4nugMHh5e8uvuskUZCI zaSBAI4AE67GxWqD6FT4AihyI+kHPaJ9EGQE2gl9LDwSgfD3bEwZWeqskugPZK3RAY1R 9dzP8ClOMZd4/UA/6ndPUby7appiS6xpaMy+ghjoJrIKLjug+BUP8v77fbsHMm9Hk9R4 OsT8H9L0u+fsguuWeKpRDikXwtlqmZZstFMJ8K55B9thLK8WDlbtsrq1VyLtSDZQmLSK aDQKCNzHfSPjAs7nWj+OFEOs31+sND2vc9pHwDLn+qdCs7MZmYQllWZ+bi7tEu7aYBbb QYWw== X-Forwarded-Encrypted: i=1; AFNElJ/mrp+qObJGeenu38fsqoLWqmPPzjTBfmTkdl6WayHb8io244sVSnKCyHsTde86/+do1MNQX9Qlildk5rJv@postgresql.org X-Gm-Message-State: AOJu0Yz0Pqfolp79Lfsl5v+Z5EBYSmpYeyVss5yHBihkh39kiFK2mtOX rXD+FSTR8EIj8OOc6uNYU7qoiESHfH4brqARdtTTlkxDNGeex3CzEPc39uzGbBu1OW/RP9uT/Ii G7m8KJoKsEaINDCkjK5yjYRKhT840Z5A= X-Gm-Gg: Acq92OHPSu/ndOWravQd4z2jnS62NI4vILzS23VqHKJ6JKBeHd2axlsO3NRnVxCQH96 oQzkXCj/dEALyWWLsfiFCNXHKkSAcLv3HXFxCJvq4Tf+ChTG4EugVaWjmGytfD+rFIyO+obdCKg iiFs9Dvg5F9zp2F2Zt0HDVlMstDP8Fl+fO2QPtDeK+10yw3kgtme6kW/yUsw3V4at5bQcnAM4Hm M7DmtdKZxZQxPXxdVzxFjOm8kRnodpDdwEadYC8z5Sb3KYeOtusrS+MhCqtnJl8Vvo8isZGRe4A E68JMU8Gj28i2fTVEJ+qh+/B/Lsd1Lht+O4zI4Rxaphq5Z8wMG3Y7YL2/hdAGOu8NrS5zSvkJcl d68PvxzfossRfbXQ= X-Received: by 2002:a17:907:2d07:b0:bd4:f3bc:c4b3 with SMTP id a640c23a62f3a-bf3a840e4d6mr1020283466b.18.1781080585897; Wed, 10 Jun 2026 01:36:25 -0700 (PDT) MIME-Version: 1.0 References: <7daef094-abf3-4672-bc23-3df4763b16a3@gmail.com> In-Reply-To: From: Xuneng Zhou Date: Wed, 10 Jun 2026 16:36:14 +0800 X-Gm-Features: AVVi8CdLOiPYT7XXMd7zIDaVlcJ1CzXUtaLRX68zjcgjPBYHy9DUAqLWQY3OnHw Message-ID: Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline To: Bertrand Drouvot Cc: "Hayato Kuroda (Fujitsu)" , Alexander Lakhin , pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Jun 9, 2026 at 7:01=E2=80=AFPM Bertrand Drouvot wrote: > > Hi, > > On Tue, Jun 09, 2026 at 03:49:50PM +0800, Xuneng Zhou wrote: > > On Mon, Jun 8, 2026 at 10:34=E2=80=AFPM Xuneng Zhou wrote: > > > > > > On Mon, Jun 8, 2026 at 10:22=E2=80=AFPM Bertrand Drouvot > > > wrote: > > > > I've readed through the patch set. They look good overall. > > Thanks for the review! > > > Here're > > some comments on them: > > > > 1) In the commit messages and comments for all four patches, the > > reason why the target WAL segment cannot be found on the old timeline > > is described as follows: > > > > "old timeline WAL segments have already been removed or > > recycled by RemoveNonParentXlogFiles() in CleanupAfterArchiveRecovery()= ." > > > > Is mentioning the 'remove' case only a bit narrow? > > > > The timeline-selection comment says this explicitly: > > "there's no guarantee the old segment will still exist. It may have bee= n > > deleted or renamed with a .partial suffix" > > > > How about phrasing it like: > > old timeline WAL files may have been removed, recycled, or renamed to .= partial. > > > > After running the reproducer provided by Hayato-san, the standby=E2=80= =99s > > pg_wal directory looked like this following the failure: > > 000000010000000000000003.partial > > 00000002.history > > 000000020000000000000003 > > 000000020000000000000004 > > > > So in this repro, the requested file: > > > > 000000010000000000000003 > > > > was not unlinked as a regular "removed" file. It had been renamed to: > > > > 000000010000000000000003.partial > > > > but the log says this explicitly: > > ERROR: requested WAL segment 000000010000000000000003 has already been = removed > > > > It appears inconsistent to me... > > I'm not sure. The error message says "has already been removed" and the c= ommit > messages and comments says"removed or recycled": those are consistent wi= th the > error message. We're describing the symptom from the walsender's perspect= ive, > not the exact file operation that caused it. > > 2) Injection points in tests 0002 and 0004 > > > > It does not prove this: > > walsender has reached logical_read_xlog_page() while startup is paused > > > > 3) Stricter synchronization point in both tests > > Both tests use this condition "active_pid IS NOT NULL" for > > synchronization at the walsender side. However, it only proves that > > pg_recvlogical has connected walsender has acquired the logical slot, > > not necessarily the walsender is paused after acquiring the slot and > > before the promotion window is set. There are several potential states > > for walsender in this condition: > > > > walsender is just after ReplicationSlotAcquire() > > it has called XLogBeginRead() > > it is already inside logical_read_xlog_page() > > it already opened the WAL segment > > it already failed or succeeded > > > > The test cannot distinguish those states reliably. > > > > So we may still need another injection point for synchronization at > > the walsender side > > I agree that with v1 the test could have been fragile. It's fixed in v2 w= ithout > having to add a second injection point. All we have to do is to ensure th= at > the decoding occurred while the startup is paused on the new injection po= int. > > 0002 does that by starting the new walsender and doing the decoding while= the > startup is paused > 0004 does that by ensuring the pre-connected session triggers the decodin= g while > the startup is paused That should work, and it=E2=80=99s cleverer. I was fixated on the idea that= we needed to start the walsender, pause it, suspend the startup process to enter the promotion window, and then resume the walsender. The essential thing is just to ensure that the startup remains paused until decoding output is observed. -- Regards, Xuneng Zhou HighGo Software Co., Ltd.