Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRNkd-002IkU-0C for pgsql-hackers@arkaria.postgresql.org; Mon, 25 May 2026 05:21:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wRNkZ-0002Jv-0l for pgsql-hackers@arkaria.postgresql.org; Mon, 25 May 2026 05:21:16 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRNkY-0002Jm-1c for pgsql-hackers@lists.postgresql.org; Mon, 25 May 2026 05:21:15 +0000 Received: from mail-australiasoutheastazolkn190100000.outbound.protection.outlook.com ([2a01:111:f403:d40e::] helo=MEUPR01CU001.outbound.protection.outlook.com) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wRNkX-00000000eXn-0m5n for pgsql-hackers@lists.postgresql.org; Mon, 25 May 2026 05:21:14 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=OtECwiDPCFGxGJZDQFCy0IwtkCYwcDIn/mFM1hJrlFHFKKHdp0x9yTUMVe3qCPR152Ysy2UkzF6abZYRBbXEQxB90xmNj6sAeVTnuTlVZo73fYbRMTF+I1KyKrHtt948QRMgSPHbCXpQeQNpzJTB8HqlvGNVyegYYQYUqeBYJe8SFNY7KSl3umTcRISYBfkgH0HvWZm68zMKALotzBY/PZGMJ7ygBfMFRRJwnrhqXmlIEY70gpOVypm1NT44gS6fIN4hgB+d/B7UN/yUqefy6oPLp3lTQty/dyle7p6iQ4lFzAxL4tjKskhT9R21ULJhu/JuJJ8eWfQJ9Zl8WvUrcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=44kr2uAryw4uWGbPUziuC41pyVEbyJHQQIdeLGweIOo=; b=lLcZc9Qj7dGCr+8SiaW7H1qlDceugebcoLzaLRaFyARqH5khjicWYa8ttIraiH5R96rKDCVTw/cSNMYTk3t2fGII4zyeQ6qimU8t08VEx8Mdtv1a8PUfWKUEJLyEq/Ccm03jGdvheRF2oFjNEzZ4rBpJg7VfXwGYn+q6GXbXZwtxVQH5s3dd0NMhuWSNzybdRWGjuEUEUT2/4mqZKQJIxhiMOg9MeccQ0tNWHeJU9+njwUPnyMMt0BjEWf+RzpUMTKiUr8n3m9qeQCjeJxmSSGZjjpY5eF3BqAjNQZS1AZo7T4ttSqr/AKmafVLsgRq6YTrVFswJQayF/To/zWMs1w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=44kr2uAryw4uWGbPUziuC41pyVEbyJHQQIdeLGweIOo=; b=J4nBVfMjeuaJ5fp4dTFGtXNBnYEw6bD++XHVwhumQ8KtXVW7bOrmZ2kachiZU0Bn95iyFCQEXnrmJwBaxgorKrRpdeGRlnBQzDdnbTBdex6lGOQ78KUgsC+FvuQQXcn7nc67JbjjYzPk5ypuSYPliCV15PHWLWFnqK6RkcDD8YbBNL4L+NgNxdz4fgt8MDgCK12F7sJnrTyDNN5H3bzImHKI1NbhLJ9zULZar7X+qdZmjkCbVbNTfdlbuYMU99XTa3KZilabh4+uAjO/JlZtt7vX+UkAbtBKBTAoOcXDZ8OUUvZdJzbjxGlD+AOnPHguRwCOclZKyfyCReX0z+eJNw== Received: from SY7PR01MB10921.ausprd01.prod.outlook.com (2603:10c6:10:334::16) by SYBPR01MB5757.ausprd01.prod.outlook.com (2603:10c6:10:9f::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.48.19; Mon, 25 May 2026 05:21:05 +0000 Received: from SY7PR01MB10921.ausprd01.prod.outlook.com ([fe80::7908:e00:4ab1:d120]) by SY7PR01MB10921.ausprd01.prod.outlook.com ([fe80::7908:e00:4ab1:d120%5]) with mapi id 15.21.0048.016; Mon, 25 May 2026 05:21:05 +0000 From: Japin Li To: Mats Kindahl Cc: surya poondla , pgsql-hackers@lists.postgresql.org Subject: Re: pg_rewind does not rewind diverging timelines In-Reply-To: (Mats Kindahl's message of "Sun, 24 May 2026 20:30:17 +0200") References: User-Agent: mu4e 1.14.1; emacs 30.2 Date: Mon, 25 May 2026 13:20:57 +0800 Message-ID: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: TPYP295CA0047.TWNP295.PROD.OUTLOOK.COM (2603:1096:7d0:8::16) To SY7PR01MB10921.ausprd01.prod.outlook.com (2603:10c6:10:334::16) X-Microsoft-Original-Message-ID: <87ldd8awue.fsf@hotmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SY7PR01MB10921:EE_|SYBPR01MB5757:EE_ X-MS-Office365-Filtering-Correlation-Id: fdac8969-e54f-49af-61b0-08deba1d6b15 X-Microsoft-Antispam: BCL:0;ARA:14566002|8060799015|19110799012|15080799012|41001999006|23021999003|6090799003|5072599009|24121999003|22091999003|24021099003|440099028|3412199025|18061999006|12091999003|26121999007|52005399003|40105399003; X-Microsoft-Antispam-Message-Info: =?utf-8?B?MzBEQWpHc3F4UHZPVlV4VVp1ZHpOL01QZnJBM08rUmVXVTFTbGJDUitxWWEx?= =?utf-8?B?cU1RRVRHSmRqOTBaQ3B0cEFqT1lWdlBtUFFJdi9jdFJWanhZR2huZEJpNlVt?= =?utf-8?B?aEtSdU5EZFlHcDVxUlZVdFgzdHRQRVpOSmxZK0x4enZtZ1p0dWI1M0h2ekI3?= =?utf-8?B?OEVDaUJLY0pkdE1NVndieFNodDRzRXZuZDl5Ky9ZMzlCNkFHVjFUMlprODhK?= =?utf-8?B?b3hwVlpnMGJvNnNnckRzQXM4cEhDVlllKyszSVVBcWhpNlZ3ZDZBV3dIaitK?= =?utf-8?B?azZ0V210dWxtRVVJbzhiQUc1SnF4NDNQZ2d4ZVJycWliQ2krOUNuNEhLUlRz?= =?utf-8?B?enZicFh1NmcwZERaaCs4TUZpQ0Q0Nzg5bmtPNDBxUUpLTFZlQXVnUVNmY3Rt?= =?utf-8?B?RXYzak5mbnhNMGcza1Y2eUVsSjVWa2VsUllFQUMrTGZKL243YXlLRkFNZ1FN?= =?utf-8?B?QjJOaTNESWxlMk5BbDNwSUxnQitEZHA2U3N2TnhoNUlvWm5FbGNmQkNuVEVV?= =?utf-8?B?VmtKY2h4NW93QUJPeFpiMHA0QVA4MFFBd3VJZFFTV1NNbnE2T2FURFIwaWM3?= =?utf-8?B?QlpXR1I2T2p3RVA5azlHWmJrNDczanVGdTA0RENRTTZSM3A0T0Vna2NGUmtr?= =?utf-8?B?ZVdVQm9yYkxxSHFEdWl4UmxuV2JsRXZRdStvWis2bGt1VkpnMlArZllmU3kr?= =?utf-8?B?Y3cxbGRQaGs4eU5pTzNubThYVmxneW5nY3E5Y0wrTzJDNUtrYnNuRTA5dWhE?= =?utf-8?B?aDd3bC9TRXlac2lGYTV0RGt2MlFiS1k0bWFNVU9kYWdsVktlREZEUTRNSFNo?= =?utf-8?B?NzB6M0UxMVJBVVNaNFZFOVcrTWNVcEJ3dlVtcUJSa1YwZW5kb085aTdBcEFx?= =?utf-8?B?MGljK0hxc1pUNVpVMkRwTGovU2Q4VXV5VitHVktPYkxVelNrN2loK05ZMGsz?= =?utf-8?B?ZnFJeFVlb0doN3BqdUNMM3FEdkU2dWozOURyV1l6cnNQbDZia3NHdG9Ua2Fp?= =?utf-8?B?OFc0djNWdnZnS0NhVDNBV1NOZjJmaHZUK2hqbnR4eW9MU0YwR3BYMTdIdDds?= =?utf-8?B?Z201cGJubTZONDI4ZWNOYWdpSXFvemR6MGd2K1VOV2d4a1JUVXRkSVZvVmZM?= =?utf-8?B?SFdPaHZuTFlac05aQ1hRNFZpanBBck1QRElodzI3WnlBblF5b3hnT2ZlcGhH?= =?utf-8?B?SnY2b01uaU1mZUphTmZUclNMcTIrNFdKbFBUc3NmOUlmK1FvMEVxOFJmdTZP?= =?utf-8?B?Vi9kZDh0aWRWRXFHQzh4U0V6QXlnblF1enowdWRra1RWQ1Z3WWlVYkxWOWdw?= =?utf-8?B?N0l4K2laS2JEWi8yenY4emlMTHI1aEx0UzFQR2FoRDVRTTF1dE96OGFnYlJJ?= =?utf-8?B?RmdJRk1DN2hrb1hsblE3RFRRN1hKbHRVdXExay9ZQ3F4eldVazY2c1NudXRp?= =?utf-8?B?TFB5ZStNR0RSU3ptb2N4TXVNakdTeVdHR3MxWS83ZDFDRmhBK0lXTTA3NUM1?= =?utf-8?B?eUtibUIreHl3UzlqdzhTQUpXQXczMHdHNVJnUXFIQXJNUlh0dUtQNmR5SzRE?= =?utf-8?Q?+nmq+vUEEd1ReMHD9yCWHPIt4=3D?= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?bHlpSXJ6cmwrQW80N3d1cUo5S05qMXlGWVFJT09ybGxETEFkSjBhZEMyMUpm?= =?utf-8?B?dkNya2h4Wmh5RG5NRHNIak5OMTdFdW4rdjRnZGp1R2NRcUEzSUh2dk1WNXlv?= =?utf-8?B?eW1aZk1WZHBnQ3duSmFvczVza2JKdko2end3aUV6TlRrYlFEKytmMWdEZXZk?= =?utf-8?B?dmZMeGlVbGx3VFNMRzVZUEx4L1Bhek5lRGNseFNOdDZhZnlPc2hKSUIzeUhB?= =?utf-8?B?dmdEbjkzbTRtVUlDRC80YW42ZE9rNy9JY29QQzllbVR4RzZibkh5UVI4aEtm?= =?utf-8?B?emRXK3JsZDNqT0NMdXliTXlxa1RXOEZ1QU15UC9DM1pmV0ljOTNIVnZxc1pX?= =?utf-8?B?N1FIK1kvUVZzcndEV3hPeElETFJkR0d6cG81eXM0Y0R0dElKd0xMOFVnVFBm?= =?utf-8?B?WTdxbmhIZkhlRUZBR2w4b1hNalNOWTgxZU5ieVRCNG55MUpjQWhkWjZKRm05?= =?utf-8?B?R1lYcW14NFBiYXhVS3dxL1ZCWWlRNmRjaXVwTEhBZlViRmUrU1FoZ09BRm96?= =?utf-8?B?UTQ4ZHlWYUpyNmhQYzEwWjVYSCtIdkRnMVB2WnlWNVZoK3BXYjZaemQ5dDVn?= =?utf-8?B?cW1ITkFqWElqVVQ1ZXBSNnB6Nzd4d3VOME5yTGt4K3N3bFU1NmN0Yy9pcEY4?= =?utf-8?B?YmVIQm11dHlBY2pWQUlyczR3SlF4MVNzZGVIdjk4TmVDRU9FSjg4YXU5K3E2?= =?utf-8?B?MEJ2aTEyMFhUU0NrYlZBaGZpcjJaRld0WGk0RUVNeXM4SytoRnBpRkd5SGpw?= =?utf-8?B?OHhxMHo2NzdPR3AwdFpPa1gwQlNGcUVsTXVQSGJMSnRUOG9iV1NOczQ4NnZ0?= =?utf-8?B?RDlaNFN1R3pLZnlBUEk4N0pObzgrcHN2c2dHWXlkZFVyNFJhT1lPWTZLRitl?= =?utf-8?B?VU9kSFVHQ1Jybkw4cXVFTUo3U3E4VVp5NERSV1I2SmFkeEFzVVZHVWxJTmVi?= =?utf-8?B?cGdMOUcwRWc2QlBkeGpkTmVRSzViY0NTU3c3d00xZHpZeVhhYS9MVW5XUFBq?= =?utf-8?B?blROaUV0a0RsY0pkbHFrR3hUVWFqMXk0UytZMjdQd0kxenJldml3d2NHSERP?= =?utf-8?B?dG5BKy96cUJjNHpLT1pjZTNScGVVVGRJYUgzZUNWUnhuei9ZcFJ6UkEyclZ3?= =?utf-8?B?anV0SWlXK1UyVk5CS29yUFlMeCtheGVhTUNlWVJ2S0c4VCtQQUw5UjZwSTZZ?= =?utf-8?B?S012d1I5UTNBUVdZSVU4ZTZscWJyT1daV1dlb0Q5SHdIQWhIR0puOFhwcWty?= =?utf-8?B?NFY4QjBVYUx4RE1LYy95YzJiTUJjeG5BOEtKeHYyS3ZoS3J2akRtdzdLdGtn?= =?utf-8?B?cFdHR3E1N1pjUGtBRm1uaks1c095UmpTVUlvMml6WTBQdGNnajFwcW1XZlRw?= =?utf-8?B?MFdXVWpzRG9Yd1RERkl6Mi9LWlF5UjNvWEh5Z0NsOXduU2ZFdWtLS1Eydmcz?= =?utf-8?B?clpydFBraTA3NG1CenRnQUlGZ0IyMTZWQjR4VEV2eCtSTzJhSjJ3TkNNeUE4?= =?utf-8?B?YU1jR094YlRFbWprUFB3L2VzazhZNVpxT2pyVTFpcElodkhQYTk4a1c3bDRr?= =?utf-8?B?TzI3SytIRStQZUhFYU1udy9tQmp2ZDF1L3BOcmpMQWE3bnQzTUhRV3BOd3ZM?= =?utf-8?B?bkdJMnRiT05ZbU15cXJ0OHdtcEV1OU8xVmFQQXdLMWZud09ZRXhIamNuMnNC?= =?utf-8?B?bFBJMXE5Z1JSYmZaMGpaOU4vcGpSOFJEcTRYSkZKbTFnUXkrS0x3aWNlZ0ZI?= =?utf-8?B?MStlejN3SDdNMHJTV0VNbmU2WnFNSXJHTko3UzdyN3AvWnEyWXV4UXBWOGZU?= =?utf-8?B?djY2UkdTeUZIc212WjlOaVA4TVczWXJ4OHcyeFM5QkpzbWw0Mk8zaS9FQ01K?= =?utf-8?Q?ARr5K4HAH0Rz5?= X-OriginatorOrg: sct-15-20-9412-4-msonline-outlook-feddd.templateTenant X-MS-Exchange-CrossTenant-Network-Message-Id: fdac8969-e54f-49af-61b0-08deba1d6b15 X-MS-Exchange-CrossTenant-AuthSource: SY7PR01MB10921.ausprd01.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 May 2026 05:21:05.2621 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SYBPR01MB5757 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, Mats On Sun, 24 May 2026 at 20:30, Mats Kindahl wrote: > On Fri, May 22, 2026 at 12:09=E2=80=AFAM surya poondla wrote: > > Hi Mats, > > Thanks for picking this up -- the scenario is a real one and I think the= UUID-tagging approach is a clean way to > solve it. v2 applies and builds without trouble, and the core algorithm = reads well to me.=20 > I have a handful of observations that I'd love your thoughts. > > Hi Surya, > > Thank you for the review. It is a quite rare scenario, but it is real and= the fix is simple. > =20 > Regarding Correctness I have the below thoughts > > 1. UUIDv7 timestamp epoch. > In StartupXLOG(): > TimestampTz now =3D GetCurrentTimestamp(); > generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000), > (uint32)(now % 1000) * 1000); > > I think there might be a small mismatch here: GetCurrentTimestamp() retu= rns microseconds since the Postgres epoch > (2000-01-01),=20 > whereas generate_uuidv7_r describes its first argument as milliseconds s= ince the Unix epoch.=20 > As written that 30-year offset would land in the UUID's timestamp field,= so the resulting UUID wouldn't be a > conformant UUIDv7 and wouldn't > time-order against UUIDv7s generated through the SQL functions. > > =20 > =20 > Uniqueness is preserved either way, so the rewind logic still works as i= ntended but it seemed worth flagging. > > I see conversion that's used elsewhere as: > us =3D ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) > * SECS_PER_DAY * USECS_PER_SEC; > > Or, since promotion isn't on a hot path, gettimeofday() / time(NULL) dir= ectly would also be fine. > > Yes, the intention was to use a proper timestamp to allow debugging serve= rs if necessary. Switched to gettimeofday() and > used 0 for sub-ms since this is not going to be critical. (We could use n= s here as well, but that would only solve a race > if you have two servers being promoted in the same ms, which I find unlik= ely, and there is a random number added for that > situation.) > =20 > 2. EOR-record path, the intent is unclear. > > The comment above generate_uuidv7_r() at says: > > "The same UUID is written into the history file and later into the XLOG_= END_OF_RECOVERY record so that pg_rewind can > distinguish two servers..." > > But from what I can see only the history-file part actually lands. > xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn't add= the UUID, and XLogCtl->ThisTimeLineUUID is > written under info_lck without a > reader (I couldn't grep it).=20 > > The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like p= reparation for an EOR-struct extension that > ended up not being part of the patch. > > Was the EOR-record piece something you intended to keep for a follow-up,= or has it been superseded by the > history-file approach? > > No, the EOR changes are not needed for the promotion, contrary to what I = originally thought. Cleaned up the comment and > the code and removed all traces of changes to the EOR (I hope). > =20 > =20 > > 3. Malformed UUID handling in readTimeLineHistory(). > > The optional field-4 path is: > > if (nfields =3D=3D 4 && strlen(uuid_str) =3D=3D UUID_STR_LEN) > { > Datum datum =3D DirectFunctionCall1(uuid_in, > CStringGetDatum(uuid_str)= ); > ... > } > > uuid_in() raises ereport(ERROR) on a malformed input, while the surround= ing syntax-error paths in readTimeLineHistory > () use FATAL deliberately.=20 > In practice an ERROR during startup ends up being fatal too, so this isn= 't strictly a bug but it would be nicer to > stay consistent. > > Agree. I added code to capture the error and raise a FATAL instead (with = the error message from the uuid_in, in case it > is modified it makes sense to show this). > =20 > Regarding the Tests I have the following thoughts > > The two new cases are nice, a few extensions that I think would strength= en them: > 1. A mixed-version case where one side has a zero UUID. That's the path = we're claiming is graceful, but nothing > currently exercises it > > Yes, that should work regardless of whether the source or the target has = the zero UUID. > > I realized one thing: if two timelines have identical TLI but one has zer= o UUID and one has not, it seems they could not > come from the same promotion (one promotion happened on an old server and= the other one on a new server), that is, they > should be treated as different. Does that make sense? I made the necessar= y changes in the attached patches for testing. > Please have a look. > =20 > 2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that fin= dCommonAncestorTimeline's loop walks past > matching entries > before hitting the mismatch. The 0002 test puts the divergence at d= epth 1. > > I was unsure if this test was necessary or interesting, hence a separate = commit. Since you thought it was useful, it's > now rolled into the patch and I extended the tests with the scenarios you= suggested. > > I also did some refactorings of the tests to avoid duplication. More belo= w. > =20 > 3. A small assertion against the on-disk 00000002.history contents, to p= in down the file format. > 4. On 0002 the dependency on restore_command pointing at node_x's pg_wal= is the kind of thing that tends to break > under > environment changes. A CHECKPOINT on node_x before the backup, or w= al_keep_size as in 0001, would let the test > stand on its own. > > Good point. > > I refactored the code to avoid some duplication and make the test flow se= lf-explanatory and as part of that I set the > wal_keep_size for all nodes. > > In the process I noticed that many of the functions in RewindTest.pm do t= he same job as the primitives I wrote, but have > hard-coded variable names. I could rewrite them to take parameters, but t= hat would be quite a big patch to add additional > changes to each call site, so I did not do that and rather added small wr= appers specific for the tests in > 005_same_timeline.pl=E2=9A=A0=EF=B8=8F. > =20 > Attached a new version of the now single patch. > > I'm happy to keep reviewing/contributing, thanks again for working on it= . > > Thank you for reviewing it. Thank you for your work. I have one comment. + a =3D &tlh->source[tlh->sourceNentries - 2].tluuid; + b =3D &tlh->target[tlh->targetNentries - 2].tluuid; + + if (memcmp(a, &zero, UUID_LEN) =3D=3D 0 && memcmp(b, &zero, UUID_LEN) =3D= =3D 0) + return true; + + return memcmp(a, b, UUID_LEN) =3D=3D 0; Since we already have matchingTimelineUUID(), the above code can be simplif= ied using it. --=20 Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd.