Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wNXwP-000tW5-27 for pgsql-bugs@arkaria.postgresql.org; Thu, 14 May 2026 15:25:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wNXwO-00DUPT-26 for pgsql-bugs@arkaria.postgresql.org; Thu, 14 May 2026 15:25:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wNSBr-00Be1O-1Q for pgsql-bugs@lists.postgresql.org; Thu, 14 May 2026 09:17:11 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wNSBp-00000000Xyo-052K for pgsql-bugs@lists.postgresql.org; Thu, 14 May 2026 09:17:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:Cc:From:To:Subject: Content-Transfer-Encoding:MIME-Version:Content-Type:Sender:Content-ID: Content-Description:In-Reply-To:References; bh=QBWcr8Keh9tiiEnDTfcqjceYCbfXwrib0Rgdov/JAdE=; b=YxWCBd47Vb/irGhV9JaqzDHJMd s4e3O/FDihePToKOifi/Kk9nQl6Eo3dg+ZQjM6ti3bb6iGLlwC3hkP+uwjBmIFUBgr8u81Vmyzyfm Z5mk+Sq6bcB+zS3rx/pfJ4c/JFLtXConBaQV5LG8yeYBYZZ0yUBhUnfAIPwiLqL+bYlir5xur4lSa hv/VFFO1PTeP6FrqKX4Bk+yyQXDXXz54cIaUY72RrSjNmd2ATvhPnSggeVjhQu6Nnh9oxDpprlnc0 FbJSesUfBxXHs306rO8VjhGiBnqlnWNOnnPfKIPqDAPVRt/we4TxmlMFSfVWSWWe2WPxXrh7OnPY0 uZAIVSOg==; Received: from wrigleys.postgresql.org ([2a02:16a8:dc51::60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wNSBl-001TfK-2F for pgsql-bugs@lists.postgresql.org; Thu, 14 May 2026 09:17:07 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wNSBi-002zQC-2z for pgsql-bugs@lists.postgresql.org; Thu, 14 May 2026 09:17:02 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: BUG #19477: pg_rewind does not rewind a diverging timeline To: pgsql-bugs@lists.postgresql.org From: PG Bug reporting form Cc: mats@kindahl.net Reply-To: mats@kindahl.net, pgsql-bugs@lists.postgresql.org Date: Thu, 14 May 2026 09:16:38 +0000 Message-ID: <19477-71b6bc584cc5ae26@postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk The following bug has been logged on the website: Bug reference: 19477 Logged by: Mats Kindahl Email address: mats@kindahl.net PostgreSQL version: 18.3 Operating system: All Description: =20 There is one scenario that I assume is known that TLC found, but does not seem to be fixed. It is a relatively rare case, but since the fix is quite easy, I thought I'd share it with you and get feedback. The scenario can occur if you're unlucky and have more than one crash when promoting standbys to be primaries, and goes like this: You have three servers, S1, S2, and S3. S1 is primary and S2 and S3 are standbys. All are on timeline (TLI) 1. 1. S1 crashes 2. S1 recovers and starts promotion. It writes XLOG_END_OF_RECOVERY (EOR) for TLI 2 to the WAL. 3. S1 It manages to write some records W1 to the WAL. 4. Before the EOR is replicated to any standby, S1 crashes again. It is now on TLI 2 and has some changes that are not elsewhere. 5. S2 is promoted. It writes an EOR for TLI 2 (since it is not aware of any other timeline) to the WAL. 6. S2 writes some records W2 to WAL and now S1 has a record of TLI 2 version 1 (TLI 2.1) and S2 is on TLI 2.2. 7. S1 recovers and wants to join as a standby. You run pg_rewind to get rid of the extra data, but since S2 is also on TLI 2, pg_rewind will happily assume that both are on the same timeline. 8. S2 is now a standby but has that extra record for W2 both in the WAL and in the database. There is a fix posted in pgsql-hackers@lists.postgresql.org [1] that solves it by adding a UUIDv7 to the timeline history file. [1]: https://www.postgresql.org/message-id/flat/CAN305gBeJr8m7ZRW9mH0zakEFR4hDUP= Do8fJRKJOHWMORG5_Bg%40mail.gmail.com