public inbox for [email protected]  
help / color / mirror / Atom feed
From: Maksim.Melnikov <[email protected]>
To: [email protected]
Subject: Incorrect checksum in control file with pg_rewind test
Date: Thu, 4 Sep 2025 18:18:30 +0300
Message-ID: <[email protected]> (raw)

Hi, hackers!

I've got test failure for pg_rewind tests and it seems we have 
read/write races
for pg_control file. The test error is incorrect checksum in control file.
Build was compiled with -DEXEC_BACKEND flag.

# +++ tap check in src/bin/pg_rewind +++
Bailout called.  Further testing stopped:  pg_ctl start failed
t/001_basic.pl ...............
Dubious, test returned 255 (wstat 65280, 0xff00)
All 20 subtests passed

2025-05-07 15:00:39.353 MSK [2002308] LOG:  starting backup recovery 
with redo LSN 0/2000028, checkpoint LSN 0/2000070, on timeline ID 1
2025-05-07 15:00:39.354 MSK [2002307] FATAL:  incorrect checksum in 
control file
2025-05-07 15:00:39.354 MSK [2002308] LOG:  redo starts at 0/2000028
2025-05-07 15:00:39.354 MSK [2002308] LOG:  completed backup recovery 
with redo LSN 0/2000028 and end LSN 0/2000138
2025-05-07 15:00:39.354 MSK [2002301] LOG:  background writer process 
(PID 2002307) exited with exit code 1
2025-05-07 15:00:39.354 MSK [2002301] LOG:  terminating any other active 
server processes
2025-05-07 15:00:39.355 MSK [2002301] LOG:  shutting down because 
restart_after_crash is off
2025-05-07 15:00:39.356 MSK [2002301] LOG:  database system is shut down
# No postmaster PID for node "primary_remote"
[15:00:39.438](0.238s) Bail out!  pg_ctl start failed

Failure occurred during restart the primary node to check that rewind 
went correctly.
Error is very rare and difficult to reproduce.

It seems we have race between process that replays WAL on start and 
update control
file and other sub-processes that read control file and were started 
with exec.
As the result sub-processes can read partially updated file with 
incorrect crc.
The reason is that LocalProcessControlFile don't acquire ControlFileLock 
and it
can't do it.

I found thread 
https://www.postgresql.org/message-id/flat/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de,
where the similiar issue was discussed for frontend programs. The 
decision was
to retry control file read in case of crc failures. Details can be found 
in commit
5725e4ebe7a936f724f21e7ee1e84e54a70bfd83. My suggestion is to use this 
approach
here. Patch is attached.

Best regards,
Maksim Melnikov


Attachments:

  [text/x-patch] v1-0001-Try-to-handle-torn-reads-of-pg_control-in-sub-pos.patch (2.3K, 2-v1-0001-Try-to-handle-torn-reads-of-pg_control-in-sub-pos.patch)
  download | inline diff:
From c7e55c28bceca7ac3a659860e1f19d5243c1499a Mon Sep 17 00:00:00 2001
From: Maksim Melnikov <[email protected]>
Date: Thu, 4 Sep 2025 17:37:47 +0300
Subject: [PATCH v1] Try to handle torn reads of pg_control in sub postmaster
 processes.

The same problem was fixed in 63a582222c6b3db2b1103ddf67a04b31a8f8e9bb,
but for frontends. Current commit is fixing this problem for cases
when pg_control file is read by fork/exec'd processes.

There can be race between process, that replays WAL on start and
update control file and other sub-processes that read control file
and were started with exec. As the result sub-processes can read
partially updated file with incorrect crc. The reason is that
LocalProcessControlFile don't acquire ControlFileLock and it can't
do it.

Current patch is just copy-paste of changes, applied for frontends,
with little adaptation.
---
 src/backend/access/transam/xlog.c | 33 ++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7ffb2179151..98f992aa812 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4347,6 +4347,15 @@ ReadControlFile(void)
 	int			fd;
 	char		wal_segsz_str[20];
 	int			r;
+	bool		crc_ok;
+#ifdef EXEC_BACKEND
+	pg_crc32c	last_crc;
+	int			retries = 0;
+
+	INIT_CRC32C(last_crc);
+
+retry:
+#endif
 
 	/*
 	 * Read data...
@@ -4411,7 +4420,29 @@ ReadControlFile(void)
 				offsetof(ControlFileData, crc));
 	FIN_CRC32C(crc);
 
-	if (!EQ_CRC32C(crc, ControlFile->crc))
+	crc_ok = EQ_CRC32C(crc, ControlFile->crc);
+
+#ifdef EXEC_BACKEND
+
+	/*
+	 * If the server was writing at the same time, it is possible that we read
+	 * partially updated contents on some systems.  If the CRC doesn't match,
+	 * retry a limited number of times until we compute the same bad CRC twice
+	 * in a row with a short sleep in between.  Then the failure is unlikely
+	 * to be due to a concurrent write.
+	 */
+	if (!crc_ok &&
+		(retries == 0 || !EQ_CRC32C(crc, last_crc)) &&
+		retries < 10)
+	{
+		retries++;
+		last_crc = crc;
+		pg_usleep(10000);
+		goto retry;
+	}
+#endif
+
+	if (!crc_ok)
 		ereport(FATAL,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("incorrect checksum in control file")));
-- 
2.43.0



view thread (6+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Incorrect checksum in control file with pg_rewind test
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox