Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tuX03-009YI4-3z for pgsql-general@arkaria.postgresql.org; Tue, 18 Mar 2025 13:28:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tuX01-00Einh-RD for pgsql-general@arkaria.postgresql.org; Tue, 18 Mar 2025 13:28:53 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tuWz7-00Ee1z-RB for pgsql-general@lists.postgresql.org; Tue, 18 Mar 2025 13:27:57 +0000 Received: from forward103b.mail.yandex.net ([2a02:6b8:c02:900:1:45:d181:d103]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tuWz4-003VoN-1M for pgsql-general@postgresql.org; Tue, 18 Mar 2025 13:27:56 +0000 Received: from mail-nwsmtp-smtp-production-main-59.iva.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-59.iva.yp-c.yandex.net [IPv6:2a02:6b8:c0c:ba8:0:640:2318:0]) by forward103b.mail.yandex.net (Yandex) with ESMTPS id 207A460944 for ; Tue, 18 Mar 2025 16:27:50 +0300 (MSK) Received: by mail-nwsmtp-smtp-production-main-59.iva.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id mRIDbJfLg0U0-CfxVAtMQ; Tue, 18 Mar 2025 16:27:49 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arenadata.io; s=mail; t=1742304469; bh=yia8zn/xrJWjrTZS/CbH5l9iMIQfMzXsl7woZoR2s2Y=; h=Subject:From:To:Date:Message-ID; b=ieFDs/zEQXGGr1KrDOXl1aEY9eO/yYJa7qDQysNb7Gww0XZc5zO8WtjxJaX4fBSK+ bMX/a4ejDfTNJDNdfZ/MtwQatyFbg3qULx+7JH1lZJ6DqBuP3Ek5pH7l3vpe5YTAs7 yibATJkm7aaQSGM5DA+Gw63EGgdsRVU8do/C41aA= Authentication-Results: mail-nwsmtp-smtp-production-main-59.iva.yp-c.yandex.net; dkim=pass header.i=@arenadata.io Message-ID: Date: Tue, 18 Mar 2025 16:27:48 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: pgsql-general@postgresql.org From: Evgeniy Ratkov Subject: Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 08/09/2024 15:26, Heikki Linnakangas wrote: > 2. Independently of pg_rewind: When you start PostgreSQL, it will first > try to recover all the WAL it has locally in pg_wal. That goes wrong if > you have set a recovery target TLI. For example, imagine this situation: > > - Recovery target TLI is 2, set explicitly in postgresql.conf > - The switchpoint from TLI 1 to 2 happened at WAL position 0/1510198 > (the switchpoint is found in 00000002.history) > - There is a WAL file 000000010000000000000001 under pg_wal, which > contains valid WAL up to 0/1590000 > > When you start the server, it will first recover all the WAL from > 000000010000000000000001, up to 0/1590000. Then it will connect to the > primary to fetch mor WAL, but it will fail to make any progress because > it already blew past the switch point. > > It's obviously wrong to replay the WAL from timeline 1 beyond the 1->2 > switchpoint, when the recovery target is TLI 2. The attached > 0003-Don-t-read-past-current-TLI-during-archive-recovery.patch fixes > that. However, the logic to find the right WAL segment file and read the > WAL is extremely complicated, and I don't feel comfortable that I got > all the cases right. Review would be highly appreciated. > > The patch includes a test case to demonstrate the case, with no > pg_rewind. It does include one "manual" step to copy a timeline history > file into pg_wal, marked with XXX, however. So I'm not sure how possible > this scenario is in production setups . Hello, Heikki Linnakangas. Your patch 0003-Don-t-read-past-current-TLI-during-archive-recovery.patch fixes the problem with recovery backup on standby, which I described at https://www.postgresql.org/message-id/acf3141b-c78d-4f28-8e15-92ed8144331e%40arenadata.io This thread also contains the test, which may show the problem. Thank you.