Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sSb0R-008Yet-HH for pgsql-general@arkaria.postgresql.org; Sat, 13 Jul 2024 11:33:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sSb0O-00GD2a-Ps for pgsql-general@arkaria.postgresql.org; Sat, 13 Jul 2024 11:33:32 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sSb0O-00GD2S-Cd for pgsql-general@lists.postgresql.org; Sat, 13 Jul 2024 11:33:32 +0000 Received: from mail-yw1-x112a.google.com ([2607:f8b0:4864:20::112a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sSb0H-001v6q-MK for pgsql-general@postgresql.org; Sat, 13 Jul 2024 11:33:31 +0000 Received: by mail-yw1-x112a.google.com with SMTP id 00721157ae682-6512866fa87so30067217b3.2 for ; Sat, 13 Jul 2024 04:33:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720870403; x=1721475203; darn=postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=vO7y9HaOTZdkD2HeZTO2ha27lKv6xkfOkoNrrJI8uuI=; b=LOfR1qUPH6LflFlzFHQwzN7PRn0O71veghD+VGt4j/yMVU5RlDM2dv1DAgtM8jJ33F tS4Ydz0z6uos4dhEBAHx9x8ljko6xcu5LCaAoVO4WEqIStH6/XevMsycKKpT7YA9IYVL ny609vJ/PpFoHJ2vX1lMROGlapUPH2/C5iV1HAMw6WboIIsGH7WotOKK2R0LHoRMCRc2 ybnmLkM++KOJ/h8agi6IH10hSh5UVcNli/Oanm1RouTkffCfbwufrjeYrRV40BfyKxXc RgKaN3sjy2qKFQRL1L7y6hW9XOpO+sKEoBK2RStOSBqj98wnfBvOw6B8MCz1ZI3f+nEi YISA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720870403; x=1721475203; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vO7y9HaOTZdkD2HeZTO2ha27lKv6xkfOkoNrrJI8uuI=; b=sA24AR0Rn+nM3TU3kIjmuceK4r4xI/ISVjzi7wh6Qia7aW6Z6fVHpPdgrXpE3Yx3Dv +EZz0i0hEW3r9lzzDE9ERWEwuWeKBYxOvMrbwuV694oX8rECNFE3m6mql7S18Esz7l1J q0Mxy5N8FIVNfvmDBAfkXmY9+8rx4YFgjG4Nj69FM9dTWt6YnvcxJAZ1g+LH6f8nEpVk gK2RSrZT93cxRjXZhTKDyLdENAR1g1MBaWZbCz6dsWBa/7/ltQLF13GtK5X8+vtihgte FDHbFmo5RpF7OuWP5KkWQtYCUrSALKnJOn3K+UQpTdgDlwyJdG8pLjB2M9DaLSmvLvLl XJRw== X-Gm-Message-State: AOJu0Yxy/HX9Nml9klx/6rvi7eXXgsdCuO7nZUJHJkoapG0zhmZvBt98 Ebpt61bmDd9oQFSlulAh/Vry2V+s8mFgSA/3efZysH8cIMUlsC1dVLRKkl7n1dINWWJwetPSzpw NgkHjcMPIo9uHqcu2t5bjmYaJd9J9dk363AQ= X-Google-Smtp-Source: AGHT+IF1AnUQLtEqAyZ0KiGAt34xPIjFXnTUaCJZX0hE2odGXdlzrvNddgVM4Z0cFzS/woEM02whawPiuYxsG4Lcx04= X-Received: by 2002:a05:690c:e05:b0:651:6888:9fee with SMTP id 00721157ae682-658eed5faccmr167718947b3.18.1720870403199; Sat, 13 Jul 2024 04:33:23 -0700 (PDT) MIME-Version: 1.0 From: azeem subhani Date: Sat, 13 Jul 2024 16:33:11 +0500 Message-ID: Subject: pg_rewind Issue: Trying to read Incorrect WAL file for checkpoint record To: pgsql-general@postgresql.org Content-Type: multipart/alternative; boundary="000000000000165be5061d1f5de7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000165be5061d1f5de7 Content-Type: text/plain; charset="UTF-8" Hi, I have a 1 primary, 2 standby streaming replication cluster. The primary node (`pgsql/16-b/data`) shut down to trigger a failover. One standby (`psql/16-a/data`) was promoted to primary, and write operations were performed on the new primary and replicated to the other standby. I attempted to use pg_rewind to sync the failed primary with the current primary, but it failed. The error shows pg_rewind is trying to read from an incorrect WAL file `000000010000000000000015` instead of `000000040000000000000015` as per the control data: pg_rewind: servers diverged at WAL location 150000A0/0 on timeline 4 pg_rewind: error: could not open file "/var/lib/pgsql/16-b/data/pg_wal/000000010000000000000015": No such file or directory pg_rewind: error: could not read WAL record at 0/15000028 *Failed Primary Control Data:*pg_control version number: 1300 Catalog version number: 202307071 Database system identifier: 7389684959150618064 Database cluster state: shut down pg_control last modified: Sat 13 Jul 2024 08:50:32 AM +04 Latest checkpoint location: 0/15000028 Latest checkpoint's REDO location: 0/15000028 Latest checkpoint's REDO WAL file: 000000040000000000000015 Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 *Current Primary Control Data:* pg_control version number: 1300 Catalog version number: 202307071 Database system identifier: 7389684959150618064 Database cluster state: in production pg_control last modified: Sat 13 Jul 2024 12:09:43 PM +04 Latest checkpoint location: 0/15009590 Latest checkpoint's REDO location: 0/15009558 Latest checkpoint's REDO WAL file: 000000050000000000000015 Latest checkpoint's TimeLineID: 5 Latest checkpoint's PrevTimeLineID: 5 Could anyone provide guidance on resolving this issue? I cannot find the WAL file that pg_rewind is trying to open and read in the data/pg_wal directory or my WAL archive directory. -- Thanks Azeem Subhani --000000000000165be5061d1f5de7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

I have a 1 primary, 2 standby strea= ming replication cluster. The primary node (`pgsql/16-b/data`) shut down to= trigger a failover. One standby (`psql/16-a/data`) was promoted to primary= , and write operations were performed on the new primary and replicated to = the other standby.
I attempted to use pg_rewind to sync the failed prima= ry with the current primary, but it failed. The error shows pg_rewind is tr= ying to read from an incorrect WAL file `000000010000000000000015` instead = of `000000040000000000000015` as per the control data:

pg_rewind: se= rvers diverged at WAL location 150000A0/0 on timeline 4
pg_rewind: error= : could not open file "/var/lib/pgsql/16-b/data/pg_wal/000000010000000= 000000015": No such file or directory
pg_rewind: error: could not r= ead WAL record at 0/15000028

Failed Primary Control Data:
= pg_control version number: 1300
Catalog version number: 202307071
Dat= abase system identifier: 7389684959150618064
Database cluster state: shu= t down
pg_control last modified: Sat 13 Jul 2024 08:50:32 AM +04
Late= st checkpoint location: 0/15000028
Latest checkpoint's REDO location= : 0/15000028
Latest checkpoint's REDO WAL file: 00000004000000000000= 0015
Latest checkpoint's TimeLineID: 4
Latest checkpoint's Pr= evTimeLineID: 4

Current Primary Control Data:
pg_control v= ersion number: 1300
Catalog version number: 202307071
Database system= identifier: 7389684959150618064
Database cluster state: in productionpg_control last modified: Sat 13 Jul 2024 12:09:43 PM +04
Latest check= point location: 0/15009590
Latest checkpoint's REDO location: 0/1500= 9558
Latest checkpoint's REDO WAL file: 000000050000000000000015
= Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLi= neID: 5

Could anyone provide guidance on resolving this issue? I can= not find the WAL file that pg_rewind is trying to open and read in the data= /pg_wal directory or my WAL archive directory.
--
Thanks
Azeem Subhani<= /div>
--000000000000165be5061d1f5de7--