Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wWuDE-002vrt-07 for pgsql-hackers@arkaria.postgresql.org; Tue, 09 Jun 2026 11:01:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wWuDC-006WLg-2t for pgsql-hackers@arkaria.postgresql.org; Tue, 09 Jun 2026 11:01:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wWuDC-006WLY-16 for pgsql-hackers@lists.postgresql.org; Tue, 09 Jun 2026 11:01:38 +0000 Received: from mail-wm1-x32c.google.com ([2a00:1450:4864:20::32c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wWuD9-00000001q3C-2NUZ for pgsql-hackers@postgresql.org; Tue, 09 Jun 2026 11:01:37 +0000 Received: by mail-wm1-x32c.google.com with SMTP id 5b1f17b1804b1-490b3637b90so46322465e9.3 for ; Tue, 09 Jun 2026 04:01:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781002892; x=1781607692; darn=postgresql.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=nWPWlCZFmhBkIldGwhn+XZe43SWzROib7wUthGZxPUA=; b=FppRRayyjqrVEWFrCEu8pT3lLH+QHreFLAwJ0+EzIlARiy1/wO0IpiH/q0NYlAbq5o 6kcMQ2baq7PrJsFzyA+7rQVd+KZ5PLzNKaqt3B5l5sekAbIiyT9lHzNTZ2NvHLUQwmzV a2LmrLbqo+rfQROx0hqaJCDW42JQG09RoglE/T7mUgYtlOtGroAnWr0VMr0CN4Drg5cx wLQcqGQJaJoN3dcfnMPuuTm2iKH7F11Qlsq7Xj3X15gEKxkf4IVmywm3z7xB8HnZ0WPF 4dxb0Z/qkdU89D7+NMFQ1ogdUkPP2oEatwpcJ31TyaVcYK2mxc4IWEUdtD3CM2GISbFb YGqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781002892; x=1781607692; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nWPWlCZFmhBkIldGwhn+XZe43SWzROib7wUthGZxPUA=; b=FPXyCz5lyISQzd0/MHH/ghWB3INxvMmx3GkpAAFxA+v2G2gRUyImEPxZYGpU8Ib4s/ O3sTS25r6c40BAWalzVVgWH8c/y4FOWBQNi0TR9nqvg/QuwY9/IZvtFf4EkmYlZG3lWV PJ7FXVMxi9PJ+KZxOVHPvh7UJoZ6+76tB9vvoBQUJ+6392OQgzhKtf5Rm/ZWSx3aUW1f C94khrJNATJxKxO0yEkS/xG92Q8mjgIwJ+Z+CkgMyiV8d4jnPzg9Ljkbin2AxN4GZLIy JzED55mat24fAv3TQcQ+nBicr5krgDuph8zLlD4p+QYJwlZjn9FV6y54IDAl60I7V1I4 HRZw== X-Forwarded-Encrypted: i=1; AFNElJ9LF0McZpkoOkDs+a5xmBjfe0QnNixPmJvhM+TeHmKWJ/Xvc3TPNtOE5VxztXYswg/qtyROSEYUEKe/cpMf@postgresql.org X-Gm-Message-State: AOJu0YwFtcW4zfoFBp5R2HfKLxEHH2GnOIbdbhJmyzJB+7gyd5b/PTGB dnZ01AuCMuO5zkq8jlKRFV/vUA+1AAhWfoSdvOvTnQZKc/XFf7pf0Bi8 X-Gm-Gg: Acq92OEjGGb67LV/ZlZXidBXGHMGQHx5QnbVQDfXmSENLykIQHmp+85OkYu8ZgykOrs w/HywGaARs/pc1qMg8bDdgHkT10pGVGRUmm6P8TyFx4oXfp1bxP0rqpea2lkDSR72dNb3dOhF+1 rMtusDEhtQ3S5S13PkT7vQKk/1mM9pFtg9lpe9LRM93Pej5l+CFdbeyQFu6UHeV/9YJNJByeHi1 uqpLLhk2fV1Rq5UuLlYGN9rR26+ld6JWewz+VpSrCQJWAmwLE9s/Y04bdBIOAmRTQH/6Gt6gxJx nY32c9RIPSAMxlC0bTMnLJl1fyz9FcnkJFJh4qIVCeoumdjBaT1t6YiLW13eW05EkCm03Bxs26J bIHPGmhHDU2BRHVWbDIAphmv86SDE/bpobBKLmYukcrEm/HMPVZaHxiRUoQXuh7/aUX0D2BLklf lhGlUmMLFxpD8ebIJeFVcmDp1Ta24KHjM1jyAdiUByjKEg9vhnpQyBRIrMbad3Ifnm72vHAgmrq sTaI8oGk8/GxZTzcF8fHA== X-Received: by 2002:a05:600d:640e:20b0:490:c2a3:23d3 with SMTP id 5b1f17b1804b1-490c2a32413mr215426295e9.35.1781002891155; Tue, 09 Jun 2026 04:01:31 -0700 (PDT) Received: from bdtpg (ec2-15-237-197-144.eu-west-3.compute.amazonaws.com. [15.237.197.144]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-490c4f73f62sm148591365e9.0.2026.06.09.04.01.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jun 2026 04:01:30 -0700 (PDT) Date: Tue, 9 Jun 2026 11:01:28 +0000 From: Bertrand Drouvot To: Xuneng Zhou Cc: "Hayato Kuroda (Fujitsu)" , Alexander Lakhin , pgsql-hackers Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Message-ID: References: <7daef094-abf3-4672-bc23-3df4763b16a3@gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="Wctt8FE9hhxDGrBZ" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Wctt8FE9hhxDGrBZ Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Hi, On Tue, Jun 09, 2026 at 03:49:50PM +0800, Xuneng Zhou wrote: > On Mon, Jun 8, 2026 at 10:34 PM Xuneng Zhou wrote: > > > > On Mon, Jun 8, 2026 at 10:22 PM Bertrand Drouvot > > wrote: > > I've readed through the patch set. They look good overall. Thanks for the review! > Here're > some comments on them: > > 1) In the commit messages and comments for all four patches, the > reason why the target WAL segment cannot be found on the old timeline > is described as follows: > > "old timeline WAL segments have already been removed or > recycled by RemoveNonParentXlogFiles() in CleanupAfterArchiveRecovery()." > > Is mentioning the 'remove' case only a bit narrow? > > The timeline-selection comment says this explicitly: > "there's no guarantee the old segment will still exist. It may have been > deleted or renamed with a .partial suffix" > > How about phrasing it like: > old timeline WAL files may have been removed, recycled, or renamed to .partial. > > After running the reproducer provided by Hayato-san, the standby’s > pg_wal directory looked like this following the failure: > 000000010000000000000003.partial > 00000002.history > 000000020000000000000003 > 000000020000000000000004 > > So in this repro, the requested file: > > 000000010000000000000003 > > was not unlinked as a regular "removed" file. It had been renamed to: > > 000000010000000000000003.partial > > but the log says this explicitly: > ERROR: requested WAL segment 000000010000000000000003 has already been removed > > It appears inconsistent to me... I'm not sure. The error message says "has already been removed" and the commit messages and comments says"removed or recycled": those are consistent with the error message. We're describing the symptom from the walsender's perspective, not the exact file operation that caused it. > 2) Injection points in tests 0002 and 0004 > > It does not prove this: > walsender has reached logical_read_xlog_page() while startup is paused > > 3) Stricter synchronization point in both tests > Both tests use this condition "active_pid IS NOT NULL" for > synchronization at the walsender side. However, it only proves that > pg_recvlogical has connected walsender has acquired the logical slot, > not necessarily the walsender is paused after acquiring the slot and > before the promotion window is set. There are several potential states > for walsender in this condition: > > walsender is just after ReplicationSlotAcquire() > it has called XLogBeginRead() > it is already inside logical_read_xlog_page() > it already opened the WAL segment > it already failed or succeeded > > The test cannot distinguish those states reliably. > > So we may still need another injection point for synchronization at > the walsender side I agree that with v1 the test could have been fragile. It's fixed in v2 without having to add a second injection point. All we have to do is to ensure that the decoding occurred while the startup is paused on the new injection point. 0002 does that by starting the new walsender and doing the decoding while the startup is paused 0004 does that by ensuring the pre-connected session triggers the decoding while the startup is paused > 4) Stricter result checks in test files > The surrounding 035 test is stricter than the test in 0002. It first > waits for COMMITs, then compares exact decoded output. Should we > adhere to this pattern too? That's done in v2 (and also adress Hayato-san feedback). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com --Wctt8FE9hhxDGrBZ Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="v2-0001-Fix-race-condition-in-logical-decoding-timeline-s.patch"