Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0zxr-002Ows-1R for pgsql-general@arkaria.postgresql.org; Fri, 13 Mar 2026 10:41:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0zxp-003F4i-2x for pgsql-general@arkaria.postgresql.org; Fri, 13 Mar 2026 10:41:54 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0zxp-003F4V-1H for pgsql-general@lists.postgresql.org; Fri, 13 Mar 2026 10:41:54 +0000 Received: from mail-eastus2azolkn19010006.outbound.protection.outlook.com ([52.103.12.6] helo=BN1PR04CU002.outbound.protection.outlook.com) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w0zxl-00000001vLm-167C for pgsql-general@lists.postgresql.org; Fri, 13 Mar 2026 10:41:53 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=VHR5rgwnSs/bmiLKmYwYRBNgVdbsg7HL9/rOhL5Fhdz3yngsuS5Ig6hFOL6Ui7fmvcqzp6MWqT49OBuRFoBii4aIHZsKOXhvRlNKCnBLSXAspoCKYljkFdr8FI5llKMTiB7OD7kooWugxJLhOI2SBUsXY1w8ssE9HPQjTgXaPz48ZgGEEgXc0HDY8H57H7K5EkCac4yB7gRz/KMvPpvi7t6BUVZouVbDPmJoTM9TGJj7jgVrgZwc7jTfyx83F16B+RyiRPw96M0tFn50DF0lNa5u9DcIK7BciuMxASEWnKxGPIzVrilz9J6N/hXfNjVQ60346NRp7bkdLttLCzdp9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xUna3WWXHa5h3UAnskaUyMuGmpqH7D2LyWJS96djUmc=; b=tEfJA0doWSwCUEn1tuGBGEOWRVoQ8+NcEaqRBbtpuZvoryisZsA1o8p1TqBDit898luKN6539x595ADGp+rlfjpAcmLz0rD3CcAM0DWNrI68NjFcGnf9imXKZk5VbVPDzYhJO0sTlwjGyHbjfsK3IDYlHk7+i5m9Am5rKxfvG4hz43Aou8cqNTdOtz07rNvvCBr1V1dfpB+3fVa/pUSKt4O8ir7KCw1eaA2ZUokp4jt6QlXWqv2N2rW6W4MfbyhNJcsuJ1oWVmwoIHKRUqDgWjV9L0Yv5zrEV5i88yi0fjc5XVhob/sTDJgEHHxSrxksLBQdBCZk3Ra8eNQGhm3MHg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=live.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xUna3WWXHa5h3UAnskaUyMuGmpqH7D2LyWJS96djUmc=; b=DBVGIMpaWioEpNX0KMfVWRd3YeQV66Vrj2M7jvIFGOcOgYZnKVJHATKuGTQMpviiRVsqcLQQXT/+Xa8zHgCxj8fr5/PcF4yA8t2dgS9yWPIQwbiadRIk0P/N9IqXaoeu7SBMFJ0IyAHEYjhraubjKnjhwrXjX/r5/xry/Rz+866CLPzr8lTsfetx95O3w9MuZNjYIg1gJU0YTFs/mQdQOMzpk4E/p0LXkEKcXQExUJIWhYSk1PieR1FaMbRIheh6a0S/n2ycCk0pT+IMd/AZgBwNUsGV3hrnMjGCd1bnHZDOEAbn/6h8eBH2yBVoXFobQrgBywbO8VAorpsKpd2lKg== Received: from EA2PR84MB3780.NAMPRD84.PROD.OUTLOOK.COM (2603:10b6:303:25b::18) by MW4PR84MB1900.NAMPRD84.PROD.OUTLOOK.COM (2603:10b6:303:1b3::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9700.12; Fri, 13 Mar 2026 10:41:46 +0000 Received: from EA2PR84MB3780.NAMPRD84.PROD.OUTLOOK.COM ([fe80::ff1d:2167:ca2d:23db]) by EA2PR84MB3780.NAMPRD84.PROD.OUTLOOK.COM ([fe80::ff1d:2167:ca2d:23db%4]) with mapi id 15.20.9700.017; Fri, 13 Mar 2026 10:41:46 +0000 From: Ishan joshi To: "pgsql-general@lists.postgresql.org" Subject: Replication to standby broke with WAL file corruption Thread-Topic: Replication to standby broke with WAL file corruption Thread-Index: AQHcstVNU1Nzmps6tkaPqRcOjrosNw== Date: Fri, 13 Mar 2026 10:41:46 +0000 Message-ID: Accept-Language: en-IN, en-US Content-Language: en-IN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-traffictypediagnostic: EA2PR84MB3780:EE_|MW4PR84MB1900:EE_ x-ms-office365-filtering-correlation-id: 2cc3cd01-b3d1-4851-c99a-08de80ed201e x-microsoft-antispam: BCL:0;ARA:14566002|8062599012|8060799015|13031999003|15080799012|51005399006|25031999004|31061999003|19110799012|15030799006|461199028|12121999013|40105399003|3412199025|440099028|102099032|31101999003; x-microsoft-antispam-message-info: =?iso-8859-1?Q?reZWoZlwlcsNu+ihe2rz1Lw8ZcG6DBHgcaDbG5jKj5682Gk8zNRXpY1l3m?= =?iso-8859-1?Q?vKm+pOH/PGr1Xn2v+pm7NZvoeRb9aTZoBWQz/Z8a5he6IOVQAFZRwDe/Te?= =?iso-8859-1?Q?JqNJP+FTK9ZjTm7SBjr9LotLaenY+iyAFJooA9GcKwZ9RzVECXly7Yf8XI?= =?iso-8859-1?Q?n1+xQ9JoO7UlOXLlq7FQpTa8OpaE5oqUFwv+qyCQpREaHJi63xUhv2AFVe?= =?iso-8859-1?Q?BFRuFPTuUGxxuCQTs2r9szgPnItgjnfE1c4rKaK2vmyLWWiJHNxw2pmu+9?= =?iso-8859-1?Q?u0Re7bH5xy5B456h76xkY8EdDVieXqUnDbFIHzNT76HJaKWPs0PqTDAmqL?= =?iso-8859-1?Q?WxNPRLux3s9UoDl7oDA+Qqmcrb3cTu2huVgGH/79nPBNTvAdKOU/clF4lP?= =?iso-8859-1?Q?GATMk+8RJtUTqURMqsQugUtbHRe8awi+XQ3JAiJsD7RLni1GY9h1Rz9Cra?= =?iso-8859-1?Q?AKaEWmykCODRTKguIDqzcaVFw9QGQgraP3N0CNLUwdKGmhlU/WTxOOJysZ?= =?iso-8859-1?Q?uF3g48/LojNHatC3PZeY+6eF/wH4tW+TQI6aAcGfPZGma+NqwJCaaw4t/S?= =?iso-8859-1?Q?1ViZce4nLb6FGxJlEBUZ1UkUhZD3LWo1JsGXWeIGSMQu/qKnRxzQ3fqsth?= =?iso-8859-1?Q?M1x/TaOteOEGhyw9ouOKEoSIDV5Zn1aADDtdO1fdLNU+YgKTotKSqH8pd0?= =?iso-8859-1?Q?jAHHl/LcMoisypXe1ztU5b5KdHc3rgUH7zekAjCPf4+kL7+3UyQDwnTmlq?= =?iso-8859-1?Q?Qod5jlTr/wdjccEywQQ7Tuv8Cav9sm+vvI+C9zES0n2hwJRTeQSgMq7BQH?= =?iso-8859-1?Q?KecExNeli0LTmxTc2/Poip3zMNFl1q1CpTuGMgRsjYNDOIiO+XNKh9cLNw?= =?iso-8859-1?Q?0L+EaoqTF8aPGdQ15GoT4PvbwlOflmubTmbBMtG3aU/mBHgq8zKmY536Qo?= =?iso-8859-1?Q?v0Eeibr57EeVwQFPR0qhH80Pgm50tJaOLxI9g7FkQyOd5GigWm3yeDzWUt?= =?iso-8859-1?Q?tuiRhepWvGRpCHYebUb7bN09TXGQogTM0ndhonHFgCPFsBOYkR28ied3H6?= =?iso-8859-1?Q?+TGOH1beSEvSOywSBn+EKzO2F+cm//elH7h3QDotaMtl?= x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?LyK8R8+cSZovZkp3kYyEqECQW8B0Amh2iqkqcSYLzIpyIMgEVZAG9aFzkI?= =?iso-8859-1?Q?te9iHXjy1969pOR0mielXFX8r8cBg/7kDQ09mEjSUPVUN5malw1v0iKeUZ?= =?iso-8859-1?Q?nEC4YYC9iJg10DNKM8E31S2azCbOt6SsuMgh/BJdeXMqRwgkZX930qbO/q?= =?iso-8859-1?Q?45oPoH+OizasmVuQFur+/kky9LEwC8bhVlY5ys5xOiqaJNvQ2lFoCgJVuY?= =?iso-8859-1?Q?3jac3zxZlNIfW7FY6FPp9XcfnFHrgjS5YxUU/7jFfqKp9jop4+JQovn3g5?= =?iso-8859-1?Q?k+UOoR14bZwZZBQRnZg1dPJXaHfqiP5j7R08/2SkyRRZVcv2lap2wEpFgf?= =?iso-8859-1?Q?16CQGvkfVXt4kHdjC/6tV9yxo8SGOuCvM4Eo3dCOdXQueE3/ntb/bD3wJx?= =?iso-8859-1?Q?P0PLczZ5yrEdCkaF14AANplXVF8BtXcQ+YzykRkWt9NlC9zjV22jLLmqVv?= =?iso-8859-1?Q?5WrofFtijP8NiHsLhLJfcPZ4D9KChXAWuQWST1ycQjP1Ofc7c/ChR11V6l?= =?iso-8859-1?Q?10uAiwCXxRM3cdjoSfGm/aOclybB9xkvutjiAF3n+nmOC5BxAthVd76Nbj?= =?iso-8859-1?Q?h8GH8zOvFo27LbZsaElfSyoD+HTNP8Dadl0wX1jA8rLLhFMPxz/2szbAZu?= =?iso-8859-1?Q?boPYfdICwLanJW9lL+VWQY0L4duA+ixxs1QhoYWcs+7fXFUTuUpGBZPdO4?= =?iso-8859-1?Q?LKe5/8dG1OPN144eTwINQS3OCmJ06cecLTfu/yF6PH94aW1OiJByQZ70cj?= =?iso-8859-1?Q?TunyhDtG1smB6OvME4X6Wu5Ux9JrCrG9+z3GpsMnwXuAXF8y5ze0e9YQan?= =?iso-8859-1?Q?YmGiggIZ8kpzMRFvT80aMg4Aeepw+NZB8DjZlMVtUUuPEp0EVY27kRY055?= =?iso-8859-1?Q?NRIzx8HYyKPd/BxrpVs1cEtHVnL2FZqIq/N2V/MwyyA04DFVG3tLlbWcmA?= =?iso-8859-1?Q?kOlTurHYHPQ3DUONs/15g8Lk1lKGbx14uGWdMA7UwSqOMExJ4kNEBnEQRE?= =?iso-8859-1?Q?/RbkTz9bZXoVwAw6f/pFM6ShDi44KyovkIQg9t+fpuQo0Dgrr3/WuO5pC2?= =?iso-8859-1?Q?BRc8wWrT3QZ0TsliXihglEo1YF+qBy+o5VTdq1VoTrSGQY2gY8xZUTqWDn?= =?iso-8859-1?Q?3m6Amq5hLX/RHT56LrAVYp57U+6X4M65DEuWf6tyBHL/Y4pefMHHDDZg2R?= =?iso-8859-1?Q?H8K/92KlhH7pSBWc165vN/R/GYAAQLv0x3mBX+bToxSsAA0FJN4elNXsQ4?= =?iso-8859-1?Q?q+gvhoqiwa380PKZBfU/3hwk5OVC13+Wsm/vvXFBF/Uq4hfxGCVxxmlky4?= =?iso-8859-1?Q?ztb31nncPfWYthB71ROZjPBegkbSlBnwpMbsHpwLPG4aQ8AjYU2f4XRgcq?= =?iso-8859-1?Q?yU+so8wv5Il47A33R5w+K1/OHzzqzI2rkNdzPyA40/nU30N95wc34=3D?= Content-Type: multipart/alternative; boundary="_000_EA2PR84MB378033FCA7B604B1D17A5CE3A945AEA2PR84MB3780NAMP_" MIME-Version: 1.0 X-OriginatorOrg: sct-15-20-9412-4-msonline-outlook-4a72f.templateTenant X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: EA2PR84MB3780.NAMPRD84.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 2cc3cd01-b3d1-4851-c99a-08de80ed201e X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Mar 2026 10:41:46.7419 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR84MB1900 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --_000_EA2PR84MB378033FCA7B604B1D17A5CE3A945AEA2PR84MB3780NAMP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Team, I found an issue with PG v16.9 patroni setup where our standby node replica= tion and disaster replication site replication broken with below error. It = looks like WAL corruption which later part of archive file. CONTEXT: WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115117, off:3= 5, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0: rel 1663/33195/410203483, blk 25329" PANIC: WAL contains references to invalid pages" CONTEXT: WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115117, off:3= 5, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0: rel1663/33195/= 410203483, blk 25329" WARNING: page 25329 of relation base/33195/410203483 does not exist" INFO: no action. I am (pg-patroni-node1-0), a secondary, and following a le= ader (pg-patroni-node2-0)" [61]LOG: terminating any other active server processes" [61]LOG: startup process (PID 72) was terminated by signal 6: Aborted" [61]LOG: shutting down due to startup process failure" [61]LOG: database system is shut down" INFO: establishing a new patroni heartbeat connection to postgres" INFO: Lock owner: pg-patroni-node2-0; I am pg-patroni-node1-0" WARNING: Retry got exception: connection problems" WARNING: Failed to determine PostgreSQL state from the connection, fallingb= ack to cached role" INFO: Error communicating with PostgreSQL. Will try again later" WARNING: Postgresql is not running." Primary db was not impacted, however standby node and DR site replication b= roken, I tried to reinit with latest backup + archive loading from pgbackre= st backup but it fails with same error once the corrupt wal/archive file ap= plying the changes. I had to reinit with pgbasebackup with 40TB database wh= ich took about 45 hrs of time. As I understand the transcation create table ->performed DML and then drop = the table or transaction could be rollback that makes RACE condition in WAL= file creation and got failed while applying the same in standby/DR site. Looks like bug. Any suggestion for this scenario. Thanks & Regards, Ishan Joshi --_000_EA2PR84MB378033FCA7B604B1D17A5CE3A945AEA2PR84MB3780NAMP_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi Team,

I found an issue with PG v16.9 patroni setup where our standby node replica= tion and disaster replication site replication broken with below error. It = looks like WAL corruption which later part of archive file.


CONTEXT:  WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115117, = off:35, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0: rel
1663/33195/410203483, blk 25329"
PANIC:  WAL contains references to invalid pages"
CONTEXT:  WAL redo at 184F3/F248B6F0 for Heap/LOCK: xmax: 2818115117, = off:35, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00; blkref #0: rel1663/3= 3195/410203483, blk 25329"
WARNING:  page 25329 of relation base/33195/410203483 does not exist&q= uot;
INFO: no action. I am (pg-patroni-node1-0), a secondary, and following a le= ader (pg-patroni-node2-0)"
[61]LOG:  terminating any other active server processes"
[61]LOG:  startup process (PID 72) was terminated by signal 6: Aborted= "
[61]LOG:  shutting down due to startup process failure"
[61]LOG:  database system is shut down"
INFO: establishing a new patroni heartbeat connection to postgres"
INFO: Lock owner: pg-patroni-node2-0; I am pg-patroni-node1-0"
WARNING: Retry got exception: connection problems"
WARNING: Failed to determine PostgreSQL state from the connection, fallingb= ack to cached role"
INFO: Error communicating with PostgreSQL. Will try again later"
WARNING: Postgresql is not running."


Primary db was not impacted, however standby node and DR site replication b= roken, I tried to reinit with latest backup + archive loading from pgbackre= st backup but it fails with same error once the corrupt wal/archive file ap= plying the changes. I had to reinit with pgbasebackup with 40TB database which took about 45 hrs of time.

As I understand the transcation create table ->performed DML and then dr= op the table or transaction could be rollback that makes RACE condition in = WAL file creation and got failed while applying the same in standby/DR site= .

Looks like bug. Any suggestion for this scenario.

Thanks & Regards,
Ishan Joshi
--_000_EA2PR84MB378033FCA7B604B1D17A5CE3A945AEA2PR84MB3780NAMP_--