public inbox for [email protected]  
help / color / mirror / Atom feed
From: Alexander Korotkov <[email protected]>
To: Maksim.Melnikov <[email protected]>
Cc: [email protected]
Subject: Re: Incorrect checksum in control file with pg_rewind test
Date: Tue, 21 Apr 2026 15:12:58 +0300
Message-ID: <CAPpHfdsXkEWUeLUG4zh9q=hjpsOCMgsbN_XZh-6JL0z1NaNMqQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

Hi, Maksim!

On Fri, Nov 7, 2025 at 5:19 PM Maksim.Melnikov
<[email protected]> wrote:
> just to clarify, it isn't pg_rewind related issue and can fire
> spontaneously.
> I don't have any strong scenario how to reproduce it, tests sometimes
> fired on our local CI, but as you can see on thread [1],
> where the same issue for frontends was discussed, it is very hard to
> reproduce and there wasn't scenario how to do it too.
>
> Some dirty hacks to reproduce it was described here [2], and I've tried
> it on master branch:
> First of all I applied patch
> 0001-XXX-Dirty-hack-to-clobber-control-file-for-testing.patch from [2],
> then compile app with
> -DEXEC_BACKEND and exec command in psql
> do $$ begin loop perform pg_update_control_file(); end loop; end; $$;
> Also I've run pgbench command
> for run in {1..5000}; do pgbench -c50 -t100 -j6 -S postgres ; done
> And eventually got error
>
> 2025-11-07 17:58:33.139 MSK [2472504] FATAL:  incorrect checksum in
> control file
> 2025-11-07 17:58:33.141 MSK [2472501] LOG:  could not receive data from
> client: Connection reset by peer
> 2025-11-07 17:58:33.143 MSK [2472505] LOG:  could not send data to
> client: Broken pipe
> 2025-11-07 17:58:33.143 MSK [2472505] FATAL:  connection to client lost

Thank you for spotting this issue and proposing a patch.  The fork
builds don't have this problem, because fork replicated contents of
LocalControlFile to the new process.  And the postmaster has
consistent snapshot of control file as there is no concurrent process
which could write it and that moment.  But EXEC_BACKEND, even with
your patch, may end up different processes with different contents of
LocalControlFile.  I don't see it could cause a material bug right
now, but I see this as undesirable divergence between fork and
EXEC_BACKEND behaviors.  I propose an alternative approach copy the
contents of control file to the new process via BackendParameters.
This approach solves two problems at once: no torn reads, and no
divergence between fork and EXEC_BACKEND.

------
Regards,
Alexander Korotkov
Supabase


Attachments:

  [application/octet-stream] v1-0001-Inherit-pg_control-snapshot-into-EXEC_BACKEND-sub.patch (11.1K, 2-v1-0001-Inherit-pg_control-snapshot-into-EXEC_BACKEND-sub.patch)
  download | inline diff:
From 78a5a6a977352b2d6de9851299db7e3ca47080cc Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <[email protected]>
Date: Tue, 21 Apr 2026 14:55:48 +0300
Subject: [PATCH v1] Inherit pg_control snapshot into EXEC_BACKEND
 sub-processes

Under EXEC_BACKEND, each sub-process re-reads pg_control from disk in
SubPostmasterMain() via LocalProcessControlFile().  That read is not
serialized against update_controlfile(), which writes the file
non-atomically in the startup process or in the checkpointer.  The
sub-process can therefore observe a torn image and fail the CRC check.

Also different sub-processes can end up with different versions of
LocalControlFile.  It doesn't seems to cause material bug yet, but it would
be better to stick fork and EXEC_BACKEND builds to the same behavior.

This commit reproduces fork behavior on  EXEC_BACKEND by passing
the postmaster's LocalControlFile copy through BackendParameters and
having the sub-process consume that snapshot instead of re-reading
from disk.

Now, LocalProcessControlFile() have a new an optional 'source' argument:
when NULL it reads from the disk as before; when non-NULL it installs the
provided snapshot and runs the same CRC, compatibility, and GUC
setup steps.  GetLocalControlFileCopy() is a small accessor that
copies out the postmaster's LocalControlFile, so launch_backend.c
does not reach into xlog.c globals directly.

Discussion: https://www.postgresql.org/message-id/f59335a4-83ff-438a-a30e-7cf2200276b6%40postgrespro.ru
---
 src/backend/access/transam/xlog.c       | 89 +++++++++++++++++--------
 src/backend/postmaster/launch_backend.c | 32 ++++++++-
 src/backend/postmaster/postmaster.c     |  4 +-
 src/backend/tcop/postgres.c             |  2 +-
 src/include/access/xlog.h               |  4 +-
 5 files changed, 95 insertions(+), 36 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f85b5286086..fa953f5a43a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -724,7 +724,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static bool PerformRecoveryXLogAction(void);
 static void InitControlFile(uint64 sysidentifier, uint32 data_checksum_version);
 static void WriteControlFile(void);
-static void ReadControlFile(void);
+static void ReadControlFile(ControlFileData *source);
 static void UpdateControlFile(void);
 static char *str_time(pg_time_t tnow, char *buf, size_t bufsize);
 
@@ -4407,42 +4407,56 @@ WriteControlFile(void)
 }
 
 static void
-ReadControlFile(void)
+ReadControlFile(ControlFileData *source)
 {
 	pg_crc32c	crc;
-	int			fd;
 	char		wal_segsz_str[20];
-	int			r;
 
 	/*
-	 * Read data...
+	 * If a source snapshot was provided, use it instead of reading from disk.
+	 * This path is used under EXEC_BACKEND to inherit pg_control from the
+	 * postmaster via BackendParameters, matching fork semantics and avoiding
+	 * torn reads of the on-disk file.
 	 */
-	fd = BasicOpenFile(XLOG_CONTROL_FILE,
-					   O_RDWR | PG_BINARY);
-	if (fd < 0)
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\": %m",
-						XLOG_CONTROL_FILE)));
-
-	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
-	r = read(fd, ControlFile, sizeof(ControlFileData));
-	if (r != sizeof(ControlFileData))
+	if (source)
+	{
+		memcpy(ControlFile, source, sizeof(ControlFileData));
+	}
+	else
 	{
-		if (r < 0)
+		int			fd;
+		int			r;
+
+		/*
+		 * Read data...
+		 */
+		fd = BasicOpenFile(XLOG_CONTROL_FILE,
+						   O_RDWR | PG_BINARY);
+		if (fd < 0)
 			ereport(PANIC,
 					(errcode_for_file_access(),
-					 errmsg("could not read file \"%s\": %m",
+					 errmsg("could not open file \"%s\": %m",
 							XLOG_CONTROL_FILE)));
-		else
-			ereport(PANIC,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg("could not read file \"%s\": read %d of %zu",
-							XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
-	}
-	pgstat_report_wait_end();
 
-	close(fd);
+		pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
+		r = read(fd, ControlFile, sizeof(ControlFileData));
+		if (r != sizeof(ControlFileData))
+		{
+			if (r < 0)
+				ereport(PANIC,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m",
+								XLOG_CONTROL_FILE)));
+			else
+				ereport(PANIC,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg("could not read file \"%s\": read %d of %zu",
+								XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
+		}
+		pgstat_report_wait_end();
+
+		close(fd);
+	}
 
 	/*
 	 * Check for expected pg_control format version.  If this is wrong, the
@@ -4640,6 +4654,19 @@ UpdateControlFile(void)
 	update_controlfile(DataDir, ControlFile, true);
 }
 
+/*
+ * Copy the postmaster's process-local pg_control image into *dest.  This is
+ * the same in-memory copy that a forked backend inherits in its address
+ * space, and it is used under EXEC_BACKEND to hand the image to sub-processes
+ * via BackendParameters instead of having them re-read pg_control from disk.
+ */
+void
+GetLocalControlFileCopy(ControlFileData *dest)
+{
+	Assert(LocalControlFile != NULL);
+	memcpy(dest, LocalControlFile, sizeof(ControlFileData));
+}
+
 /*
  * Returns the unique system identifier from control file.
  */
@@ -5264,14 +5291,18 @@ show_effective_wal_level(void)
  *
  * reset just controls whether previous contents are to be expected (in the
  * reset case, there's a dangling pointer into old shared memory), or not.
+ *
+ * If source is non-NULL, its contents are used instead of reading from disk.
+ * This supports EXEC_BACKEND sub-processes that inherit pg_control from the
+ * postmaster via BackendParameters.
  */
 void
-LocalProcessControlFile(bool reset)
+LocalProcessControlFile(bool reset, struct ControlFileData *source)
 {
 	Assert(reset || ControlFile == NULL);
 	LocalControlFile = palloc_object(ControlFileData);
 	ControlFile = LocalControlFile;
-	ReadControlFile();
+	ReadControlFile(source);
 	SetLocalDataChecksumState(ControlFile->data_checksum_version);
 }
 
@@ -5611,7 +5642,7 @@ BootStrapXLOG(uint32 data_checksum_version)
 	 * Force control file to be read - in contrast to normal processing we'd
 	 * otherwise never run the checks and GUC related initializations therein.
 	 */
-	ReadControlFile();
+	ReadControlFile(NULL);
 }
 
 static char *
diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index 8f3cfea880c..75c6853dece 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -56,6 +56,7 @@
 #include "utils/memutils.h"
 
 #ifdef EXEC_BACKEND
+#include "catalog/pg_control.h"
 #include "nodes/queryjumble.h"
 #include "portability/instr_time.h"
 #include "storage/pg_shmem.h"
@@ -141,6 +142,13 @@ typedef struct
 	ClientSocket client_sock;
 	InheritableSocket inh_sock;
 
+	/*
+	 * Snapshot of pg_control taken by the postmaster.  Inherited by the
+	 * sub-process in place of re-reading the file from disk, to avoid torn
+	 * reads and to match fork() inheritance semantics.
+	 */
+	ControlFileData ControlFile;
+
 	/*
 	 * Extra startup data, content depends on the child process.
 	 */
@@ -150,6 +158,13 @@ typedef struct
 
 #define SizeOfBackendParameters(startup_data_len) (offsetof(BackendParameters, startup_data) + startup_data_len)
 
+/*
+ * Snapshot of pg_control inherited from the postmaster.  Populated from
+ * BackendParameters in restore_backend_variables() and then consumed by the
+ * call to LocalProcessControlFile() in SubPostmasterMain().
+ */
+static ControlFileData InheritedControlFile;
+
 static void read_backend_variables(char *id, void **startup_data, size_t *startup_data_len);
 static void restore_backend_variables(BackendParameters *param);
 
@@ -662,10 +677,11 @@ SubPostmasterMain(int argc, char *argv[])
 	checkDataDir();
 
 	/*
-	 * (re-)read control file, as it contains config. The postmaster will
-	 * already have read this, but this process doesn't know about that.
+	 * Set up control file from the snapshot the postmaster captured for us.
+	 * Reading pg_control from disk here would race with concurrent updates
+	 * from the startup process or checkpointer.
 	 */
-	LocalProcessControlFile(false);
+	LocalProcessControlFile(false, &InheritedControlFile);
 
 	RegisterBuiltinShmemCallbacks();
 
@@ -772,6 +788,14 @@ save_backend_variables(BackendParameters *param,
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
 
+	/*
+	 * Hand the postmaster's process-local pg_control image to the
+	 * sub-process.  This mirrors what fork() inheritance does on platforms
+	 * without EXEC_BACKEND, and it avoids re-reading pg_control from disk
+	 * concurrently with the startup process or checkpointer updating it.
+	 */
+	GetLocalControlFileCopy(&param->ControlFile);
+
 	param->startup_data_len = startup_data_len;
 	if (startup_data_len > 0)
 		memcpy(param->startup_data, startup_data, startup_data_len);
@@ -1029,6 +1053,8 @@ restore_backend_variables(BackendParameters *param)
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
 
+	memcpy(&InheritedControlFile, &param->ControlFile, sizeof(ControlFileData));
+
 	/*
 	 * We need to restore fd.c's counts of externally-opened FDs; to avoid
 	 * confusion, be sure to do this after restoring max_safe_fds.  (Note:
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b6fd332f196..9c92ac09bd3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -916,7 +916,7 @@ PostmasterMain(int argc, char *argv[])
 	 * processes will inherit the correct function pointer and not need to
 	 * repeat the test.
 	 */
-	LocalProcessControlFile(false);
+	LocalProcessControlFile(false, NULL);
 
 	/*
 	 * Register the apply launcher.  It's probably a good idea to call this
@@ -3265,7 +3265,7 @@ PostmasterStateMachine(void)
 		shmem_exit(1);
 
 		/* re-read control file into local memory */
-		LocalProcessControlFile(true);
+		LocalProcessControlFile(true, NULL);
 
 		/*
 		 * Re-initialize shared memory and semaphores.  Note: We don't call
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2c1f14b7889..a35d99649e4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4175,7 +4175,7 @@ PostgresSingleUserMain(int argc, char *argv[],
 	CreateDataDirLockFile(false);
 
 	/* read control file (error checking and contains config ) */
-	LocalProcessControlFile(false);
+	LocalProcessControlFile(false, NULL);
 
 	/* Register the shared memory needs of all core subsystems. */
 	RegisterBuiltinShmemCallbacks();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 437b4f32349..aab638e6a10 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -261,7 +261,9 @@ extern bool GetDefaultCharSignedness(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern void BootStrapXLOG(uint32 data_checksum_version);
 extern void InitializeWalConsistencyChecking(void);
-extern void LocalProcessControlFile(bool reset);
+struct ControlFileData;
+extern void LocalProcessControlFile(bool reset, struct ControlFileData *source);
+extern void GetLocalControlFileCopy(struct ControlFileData *dest);
 extern WalLevel GetActiveWalLevelOnStandby(void);
 extern void StartupXLOG(void);
 extern void ShutdownXLOG(int code, Datum arg);
-- 
2.39.5 (Apple Git-154)



view thread (6+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Incorrect checksum in control file with pg_rewind test
  In-Reply-To: <CAPpHfdsXkEWUeLUG4zh9q=hjpsOCMgsbN_XZh-6JL0z1NaNMqQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox