Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w214U-000nwo-20 for pgsql-general@arkaria.postgresql.org; Mon, 16 Mar 2026 06:04:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w214T-007Z4u-21 for pgsql-general@arkaria.postgresql.org; Mon, 16 Mar 2026 06:04:58 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w214T-007Z4m-0U for pgsql-general@lists.postgresql.org; Mon, 16 Mar 2026 06:04:58 +0000 Received: from mail-centralusazolkn190100000.outbound.protection.outlook.com ([2a01:111:f403:d107::] helo=DM1PR04CU001.outbound.protection.outlook.com) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w214R-00000000NK7-1WVZ for pgsql-general@lists.postgresql.org; Mon, 16 Mar 2026 06:04:57 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=iw016uT+lnGynOOwa09KlbADOkdB2BcNbwwz8L9Zf4A47PK3CfQhJUSQkbT871mvXBKNYauwPZ3VtwYHxInUh951KzX7S6Atpr/fuX7lt+TH9iOApuYfWm7QrCC+pgZafopP4+kDNxUt627sj04brxfuKvSw7qFyxNYVm3Mk0MrOjxKBw0DozhtQ4pI46el8W3WWi3AJ4L2PcZYgrb+uyMFISSkveu8vTt9QIuV2XRtQfqqFyFtJoMjoUGFREaHYvZMKvUcLCMviZSlxNOBskuStuvv/poKOT7zwt5aHsu0/Uk/zdrBDB0aylx8PntGt4vf5XLxWgAdDT9aquprYTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FwtiD5bbU7gNQvw7MHX7UWyWam0rjidmwicOETBvKv8=; b=EiKUyCRDHkIjuBbULDNfu/r4t75fdOJ9B+Z0wrZdHbplAiZfpkksWrVS9BXaWmSr39RRE9wQyH7pvFNLEjjK0T1ZZdIXPzVSE7jPQjDaR+EFdt/K2ROqNR5gUxoRYh6cO3b8F7yYJig2uRN8Ux1FaUiJwoO58l01awb0//2mxIxG761r2oZDS/ftl9PRdQFWcufvADvT/y9idwDJebI3GTP3zKvycaTr8jnwRbyxb2xu2q0AmKBYd71Ms9WAZ83qRnM7KYXE3066HkBTyP9Epl/sUb0gInZQiwTemcVlM6t3Z6/JTKKNyeTeCo8kqxHVjs4HxxoVCAK/euXkBGHxFg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=live.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FwtiD5bbU7gNQvw7MHX7UWyWam0rjidmwicOETBvKv8=; b=aAYYzq9J5ru9R1gCxA0X4Z5huLPNEXqja+Erv6KcTu1cWo2ybFlDr4aycGGONOlcb/ZeBN6q160tqUUqVcW1xZz8wIf1d+0SQtTD03oRjmc++8sZD5Rw5iFRa8wjrUQ9peP5EI7WidSmh9xWnlOQ3dBSgMjOT8X7Ws8NDUYLyOhlCOvxIK5D+JvrGTmlZXLJJFpUhFxR+NKbNwFcQYVbmnWHmXpDemLvB7lfyWtvp3MpRNyl23wsMkY4Kz0pk6CpzFkusEMD0BrfGEdZ7z7O2XfmZGRI1CGLmeTRVlCyUUXhyAGKBOPppJxQCECjhmndbQbfpEGTfLROxPTPV1iKlQ== Received: from LV8PR84MB3786.NAMPRD84.PROD.OUTLOOK.COM (2603:10b6:408:1cb::13) by SJ0PR84MB1673.NAMPRD84.PROD.OUTLOOK.COM (2603:10b6:a03:430::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9700.22; Mon, 16 Mar 2026 06:04:51 +0000 Received: from LV8PR84MB3786.NAMPRD84.PROD.OUTLOOK.COM ([fe80::509f:6883:6d0a:6c6]) by LV8PR84MB3786.NAMPRD84.PROD.OUTLOOK.COM ([fe80::509f:6883:6d0a:6c6%4]) with mapi id 15.20.9700.017; Mon, 16 Mar 2026 06:04:51 +0000 From: Ishan joshi To: Tomas Vondra , "pgsql-general@lists.postgresql.org" Subject: Re: Replication to standby broke with WAL file corruption Thread-Topic: Replication to standby broke with WAL file corruption Thread-Index: AQHctNC6DnSoS3T0gEWUp7vVLU/GC7Wwqmxt Date: Mon, 16 Mar 2026 06:04:51 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-IN, en-US Content-Language: en-IN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV8PR84MB3786:EE_|SJ0PR84MB1673:EE_ x-ms-office365-filtering-correlation-id: 4c6481e5-7db6-423c-0137-08de8321efdd x-microsoft-antispam: BCL:0;ARA:14566002|12121999013|51005399006|25031999004|8060799015|19110799012|8062599012|15080799012|15030799006|13031999003|461199028|31061999003|40105399003|3412199025|440099028|102099032|31101999003; x-microsoft-antispam-message-info: =?iso-8859-1?Q?itrA3fL2Pf2uvDyh0UajGFTrl0meLVraIRZr/xTaiQEOevs1cAvR1OuKqO?= =?iso-8859-1?Q?x4/DLD+62ekIBQGPUPjquecf6n7sI7h2igkqdmJOKnRI7GCzPknlTgtm7O?= =?iso-8859-1?Q?trnZEwz/f+zpKFQYxLmbn4S3PcmPuZBjwM0GCKzIGw0UEKWopWtMo59706?= =?iso-8859-1?Q?oAD5BCjSIt/zvuVfrPj496VnLIP9fxhEsmuRioOqG5/Ng9TQJJJLwDm6/d?= =?iso-8859-1?Q?vf2146jFNiruNWi/ggSDfzXmcWsV/zleY/kT8WluGoYmxmW+QiCyvAW+RP?= =?iso-8859-1?Q?aGav3XP09oIur/1+ulcodiHEOIf3CNwE/nIf/9H2D2wzaNWIjW3ZcpqJiE?= =?iso-8859-1?Q?0Kd0d/LPZc3s4NnjDwH0zjIwKDkbVR+Vf6rU/ndzUK1wOG9pOSThRWn2eN?= =?iso-8859-1?Q?CdtOhbxzGjl5dQ+yNTKXxQANlzMJ3jFb1m32IMJblgOLEiWfTZkNuIRZ9L?= =?iso-8859-1?Q?UWMeisNpPZpjCJAjSQ5FdZ+C6wnX5Slyb5s688v0V3pwT3pxPuzm649G7h?= =?iso-8859-1?Q?/PkixByXqSwAyFWR7Pn0N+DGo3Yc8aPVAEOXSa8+PGcA2A1AFG7q0SOt47?= =?iso-8859-1?Q?TZDoslwVb4J/cWhWMUQY+Llztdtfs8ctZ1qOeeUq9rdf4H6beOpxPxnAra?= =?iso-8859-1?Q?zEAa/UuPvU2/8tysUv0Iv0m1UZFYbDKuWBwITuelYt6n+vWW6L2EMK+jL3?= =?iso-8859-1?Q?P0LLRYEc9HWlBn6IU9/i2zrbDugKebtLP7Z2LLOjiyFh9hY2m0yEEy+DGN?= =?iso-8859-1?Q?6jijSmbdZ9hAgpkAmUglLrRrm28U75cR9awLtMFuwqEN9dCLrxyS+eU5FB?= =?iso-8859-1?Q?sTNc2BAkBqORoWx67Uu7fBYmMptOQicFHmA41ksTByH1va7B3Wyy/HbqwO?= =?iso-8859-1?Q?FmARLCHKcYIkue+eRrclu0u0BaxzG1MaTmy/gUyOHo8fXUowFVmH2kh+DN?= =?iso-8859-1?Q?icKFMlXFvMXLX8ZKbiDTdwGDr9C+GIbTxUvjmRRhQ9gqh9iKqUbaxhuDH8?= =?iso-8859-1?Q?zWDjVB9xAJUOULXxmJ42NGm2g2famfhTBzdzWeLqQ7ODIgyeARQNulrs9M?= =?iso-8859-1?Q?SUbtIAWX+SnQPQV74DSMhs1VKZIUl6ET13yMYfamt7Do?= x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?0cesM7BrPD8PkIRl6FrUJxtnX8q/knsaF8xFh04pOMtR8pSCd+W+OnDZET?= =?iso-8859-1?Q?WNXc4zlCi6ANgcJ4GXE2JG4F9Djj2v10dgLLeRbJ9HdB1bUaEtNPCY8HB9?= =?iso-8859-1?Q?9RY0W7KZ+L5tiCPYIJ8PQL/fZHKDya4Ucdf2n7NF9roECRkqlptBj7Jvv+?= =?iso-8859-1?Q?/+XwDBTJp4vN0SGFdkDcX9bcxwcdVmzQg30QhsLDONVW3aNCh262M9dg1q?= =?iso-8859-1?Q?hibXqWmwgxuaahNDWEzg3d+oOWTfjkaTIWCWbECIrmRd5nZEX3zVydpMUS?= =?iso-8859-1?Q?I0k7a0JbzSdo9NP0AGh0MgbNd++F2lhpIp11HufeqNp8ZmmIXB3Xz9s225?= =?iso-8859-1?Q?Ze7dwB4miJCjFKnwtHtsv8wmm8ZW/yRoGszDO48qfLXHXdzppKNy2GeXof?= =?iso-8859-1?Q?37Une2j+VLvv65n6bvX1cdb8r9shfQSOjJUTK3Z5BBsMS0WkVjACNN/61w?= =?iso-8859-1?Q?p09CuBjamuG4ZmJD8U7Ft71IoycSeDq69CMzxETjvB+5R6VrceL6x5Qd5g?= =?iso-8859-1?Q?o+QeVTapYCzWhlrkp9UQ4DcukVirF6804E7PDK+Ll75WRzIRwzXhLeLNug?= =?iso-8859-1?Q?J0/Q4w6ok+mEPCdlh1LObb9wCARzlSkZlzzx+BGp/T2nZiUo+i9ipLcABg?= =?iso-8859-1?Q?FwGiw4Qqek6yjUTX1PSoPNsfoYgsM/jACkz4MPuv6G1s8rACAPSBqpOKy3?= =?iso-8859-1?Q?O2eEdPHN9LkN7nZEieCaBLFrScMy36xoZAmP7OE5joR1zL5Bnv5wqh8G38?= =?iso-8859-1?Q?6cLqSwwYQvqItLiM8iS/f1lnobnr8TB7DpA7rvApEecYEXWjwk9NpjQASW?= =?iso-8859-1?Q?NHQnSa8ROLJH2Tum/5zKeQrMHajnQ9OEmGZj3iaWlCjMM4WYkINYs3MtNA?= =?iso-8859-1?Q?98ZAxxLdhE0E9d/SSqniAInRW9D5ZwjKMATkWHqVPfCGplbtIm14/ml/RV?= =?iso-8859-1?Q?qk/4QW0AjQyo5jSViggk36kApaqW6laNuhCQXRzjXe6e665gA6e9igg4r7?= =?iso-8859-1?Q?HTqn6+qn/syhONaPErLYp8Ul8Bb610Pe2zuo9aPrW0Pq2kR1MVs9DWZo+C?= =?iso-8859-1?Q?qXePnmuh+QeYbBNw68A8KTGh3oSHiEOdTjGdICpVRdqj1ImN7JO9A2sKHk?= =?iso-8859-1?Q?4wPL0VybP9O1Nfr+OnlCq4uMVI90UcuwjRvLfJNoqcKAuuTwCVmnx6v6Vo?= =?iso-8859-1?Q?HtBFQMy7EhxgwOPDaa+6lyGwtfzZz0CZanSojOFaz+K4I5TEIbqNInwDSl?= =?iso-8859-1?Q?1L475xrRJQsV0IATtfVEYfEFwqiEsG5bFIWFziyOE9JDhyZmVAN+n6jpqv?= =?iso-8859-1?Q?nn49pBo5BPTRnG2izYNB7hdFZbSYPdrS6JymRNoOkAMNMBmGPcN7YZapBn?= =?iso-8859-1?Q?CKlHUz+OYCOJ/S0A/O+/r3/JB5PY4k0mypb9wwwtaCpeQYSbuLhgU=3D?= Content-Type: multipart/alternative; boundary="_000_LV8PR84MB37866B6A6920CABB67F4E394A940ALV8PR84MB3786NAMP_" MIME-Version: 1.0 X-OriginatorOrg: sct-15-20-9412-4-msonline-outlook-4a72f.templateTenant X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV8PR84MB3786.NAMPRD84.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 4c6481e5-7db6-423c-0137-08de8321efdd X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Mar 2026 06:04:51.4373 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR84MB1673 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --_000_LV8PR84MB37866B6A6920CABB67F4E394A940ALV8PR84MB3786NAMP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Thanks Tomas for reply. 1663/33195/410203483 is table created by user through some transaction, How= ever the transcation got broke and rollback. Which makes the table drop fro= m the primary and it is not impacted. However the WAL file seems to be corr= upt at this point where the transaction carrying create table->DML -> rollb= ack, DML is logged first and the same is applying to standby and DR where t= he table is not created. Looks like RACE condition while writing WAL file. This is common scenario, if transaction got broken, it should rollback the = transaction and the sequence of the transaction should be logged in WAL fil= e. In this case, DML operation comes before table creation in WAL which bro= ke the replication. Thanks & Regards, Ishan Joshi ________________________________ From: Tomas Vondra Sent: 16 March 2026 04:39 To: Ishan joshi ; pgsql-general@lists.postgresql.org <= pgsql-general@lists.postgresql.org> Subject: Re: Replication to standby broke with WAL file corruption On 3/13/26 11:41, Ishan joshi wrote: > Hi Team, > > I found an issue with PG v16.9 patroni setup where our standby node > replication and disaster replication site replication broken with below > error. It looks like WAL corruption which later part of archive file. > > > CONTEXT: WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115117, > off:35, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0: rel > 1663/33195/410203483, blk 25329" > PANIC: WAL contains references to invalid pages" > CONTEXT: WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115117, > off:35, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0: > rel1663/33195/410203483, blk 25329" > WARNING: page 25329 of relation base/33195/410203483 does not exist" > INFO: no action. I am (pg-patroni-node1-0), a secondary, and following a > leader (pg-patroni-node2-0)" > [61]LOG: terminating any other active server processes" > [61]LOG: startup process (PID 72) was terminated by signal 6: Aborted" > [61]LOG: shutting down due to startup process failure" > [61]LOG: database system is shut down" > INFO: establishing a new patroni heartbeat connection to postgres" > INFO: Lock owner: pg-patroni-node2-0; I am pg-patroni-node1-0" > WARNING: Retry got exception: connection problems" > WARNING: Failed to determine PostgreSQL state from the connection, > fallingback to cached role" > INFO: Error communicating with PostgreSQL. Will try again later" > WARNING: Postgresql is not running." > > > Primary db was not impacted, however standby node and DR site > replication broken, I tried to reinit with latest backup + archive > loading from pgbackrest backup but it fails with same error once the > corrupt wal/archive file applying the changes. I had to reinit with > pgbasebackup with 40TB database which took about 45 hrs of time. > > As I understand the transcation create table ->performed DML and then > drop the table or transaction could be rollback that makes RACE > condition in WAL file creation and got failed while applying the same in > standby/DR site. > It's hard to say what caused this, but it might be interesting to look at the WAL using pg_waldump. First at the WAL segment containing the record triggering the failure, and then also at WAL segments before that containing references to relation 1663/33195/410203483 (and especially page 25329). It is interesting this succeeded on a primary, but failed on standby. Is there anything special about the relation 1663/33195/410203483? Do you know if it's a regular / temporary table, etc? regards -- Tomas Vondra --_000_LV8PR84MB37866B6A6920CABB67F4E394A940ALV8PR84MB3786NAMP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Thanks Tomas for reply.

1663/33195/410203483 is table created by user through some transaction, How= ever the transcation got broke and rollback. Which makes the table drop fro= m the primary and it is not impacted. However the WAL file seems to be corr= upt at this point where the transaction carrying create table->DML -> rollback, DML is logged first and the = same is applying to standby and DR where the table is not created. Looks li= ke RACE condition while writing WAL file.

This is common scenario, if transaction got broken, it should rollback the = transaction and the sequence of the transaction should be logged in WAL fil= e. In this case, DML operation comes before table creation in WAL which bro= ke the replication.

Thanks & Regards,
Ishan Joshi

From: Tomas Vondra <tomas@vondra.me>
Sent: 16 March 2026 04:39
To: Ishan joshi <ishanjoshi@live.com>; pgsql-general@list= s.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Replication to standby broke with WAL file corrupt= ion
 
On 3/13/26 11:41, Ishan joshi wrote:
> Hi Team,
>
> I found an issue with PG v16.9 patroni setup where our standby node > replication and disaster replication site replication broken with belo= w
> error. It looks like WAL corruption which later part of archive file.<= br> >
>
> CONTEXT:  WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115= 117,
> off:35, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0: rel<= br> > 1663/33195/410203483, blk 25329"
> PANIC:  WAL contains references to invalid pages"
> CONTEXT:  WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115= 117,
> off:35, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0:
> rel1663/33195/410203483, blk 25329"
> WARNING:  page 25329 of relation base/33195/410203483 does not ex= ist"
> INFO: no action. I am (pg-patroni-node1-0), a secondary, and following= a
> leader (pg-patroni-node2-0)"
> [61]LOG:  terminating any other active server processes"
> [61]LOG:  startup process (PID 72) was terminated by signal 6: Ab= orted"
> [61]LOG:  shutting down due to startup process failure"
> [61]LOG:  database system is shut down"
> INFO: establishing a new patroni heartbeat connection to postgres"= ;
> INFO: Lock owner: pg-patroni-node2-0; I am pg-patroni-node1-0" > WARNING: Retry got exception: connection problems"
> WARNING: Failed to determine PostgreSQL state from the connection,
> fallingback to cached role"
> INFO: Error communicating with PostgreSQL. Will try again later"<= br> > WARNING: Postgresql is not running."
>
>
> Primary db was not impacted, however standby node and DR site
> replication broken, I tried to reinit with latest backup + archive
> loading from pgbackrest backup but it fails with same error once the > corrupt wal/archive file applying the changes. I had to reinit with > pgbasebackup with 40TB database which took about 45 hrs of time.
>
> As I understand the transcation create table ->performed DML and th= en
> drop the table or transaction could be rollback that makes RACE
> condition in WAL file creation and got failed while applying the same = in
> standby/DR site.
>

It's hard to say what caused this, but it might be interesting to look
at the WAL using pg_waldump. First at the WAL segment containing the
record triggering the failure, and then also at WAL segments before that containing references to relation 1663/33195/410203483 (and especially
page 25329).

It is interesting this succeeded on a primary, but failed on standby.

Is there anything special about the relation 1663/33195/410203483? Do
you know if it's a regular / temporary table, etc?


regards

--
Tomas Vondra

--_000_LV8PR84MB37866B6A6920CABB67F4E394A940ALV8PR84MB3786NAMP_--