public inbox for [email protected]
help / color / mirror / Atom feedIncorrect checksum in control file with pg_rewind test
6+ messages / 4 participants
[nested] [flat]
* Incorrect checksum in control file with pg_rewind test
@ 2025-09-04 15:18 Maksim.Melnikov <[email protected]>
2025-11-07 15:19 ` Re: Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
0 siblings, 1 reply; 6+ messages in thread
From: Maksim.Melnikov @ 2025-09-04 15:18 UTC (permalink / raw)
To: [email protected]
Hi, hackers!
I've got test failure for pg_rewind tests and it seems we have
read/write races
for pg_control file. The test error is incorrect checksum in control file.
Build was compiled with -DEXEC_BACKEND flag.
# +++ tap check in src/bin/pg_rewind +++
Bailout called. Further testing stopped: pg_ctl start failed
t/001_basic.pl ...............
Dubious, test returned 255 (wstat 65280, 0xff00)
All 20 subtests passed
2025-05-07 15:00:39.353 MSK [2002308] LOG: starting backup recovery
with redo LSN 0/2000028, checkpoint LSN 0/2000070, on timeline ID 1
2025-05-07 15:00:39.354 MSK [2002307] FATAL: incorrect checksum in
control file
2025-05-07 15:00:39.354 MSK [2002308] LOG: redo starts at 0/2000028
2025-05-07 15:00:39.354 MSK [2002308] LOG: completed backup recovery
with redo LSN 0/2000028 and end LSN 0/2000138
2025-05-07 15:00:39.354 MSK [2002301] LOG: background writer process
(PID 2002307) exited with exit code 1
2025-05-07 15:00:39.354 MSK [2002301] LOG: terminating any other active
server processes
2025-05-07 15:00:39.355 MSK [2002301] LOG: shutting down because
restart_after_crash is off
2025-05-07 15:00:39.356 MSK [2002301] LOG: database system is shut down
# No postmaster PID for node "primary_remote"
[15:00:39.438](0.238s) Bail out! pg_ctl start failed
Failure occurred during restart the primary node to check that rewind
went correctly.
Error is very rare and difficult to reproduce.
It seems we have race between process that replays WAL on start and
update control
file and other sub-processes that read control file and were started
with exec.
As the result sub-processes can read partially updated file with
incorrect crc.
The reason is that LocalProcessControlFile don't acquire ControlFileLock
and it
can't do it.
I found thread
https://www.postgresql.org/message-id/flat/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de,
where the similiar issue was discussed for frontend programs. The
decision was
to retry control file read in case of crc failures. Details can be found
in commit
5725e4ebe7a936f724f21e7ee1e84e54a70bfd83. My suggestion is to use this
approach
here. Patch is attached.
Best regards,
Maksim Melnikov
Attachments:
[text/x-patch] v1-0001-Try-to-handle-torn-reads-of-pg_control-in-sub-pos.patch (2.3K, 2-v1-0001-Try-to-handle-torn-reads-of-pg_control-in-sub-pos.patch)
download | inline diff:
From c7e55c28bceca7ac3a659860e1f19d5243c1499a Mon Sep 17 00:00:00 2001
From: Maksim Melnikov <[email protected]>
Date: Thu, 4 Sep 2025 17:37:47 +0300
Subject: [PATCH v1] Try to handle torn reads of pg_control in sub postmaster
processes.
The same problem was fixed in 63a582222c6b3db2b1103ddf67a04b31a8f8e9bb,
but for frontends. Current commit is fixing this problem for cases
when pg_control file is read by fork/exec'd processes.
There can be race between process, that replays WAL on start and
update control file and other sub-processes that read control file
and were started with exec. As the result sub-processes can read
partially updated file with incorrect crc. The reason is that
LocalProcessControlFile don't acquire ControlFileLock and it can't
do it.
Current patch is just copy-paste of changes, applied for frontends,
with little adaptation.
---
src/backend/access/transam/xlog.c | 33 ++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7ffb2179151..98f992aa812 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4347,6 +4347,15 @@ ReadControlFile(void)
int fd;
char wal_segsz_str[20];
int r;
+ bool crc_ok;
+#ifdef EXEC_BACKEND
+ pg_crc32c last_crc;
+ int retries = 0;
+
+ INIT_CRC32C(last_crc);
+
+retry:
+#endif
/*
* Read data...
@@ -4411,7 +4420,29 @@ ReadControlFile(void)
offsetof(ControlFileData, crc));
FIN_CRC32C(crc);
- if (!EQ_CRC32C(crc, ControlFile->crc))
+ crc_ok = EQ_CRC32C(crc, ControlFile->crc);
+
+#ifdef EXEC_BACKEND
+
+ /*
+ * If the server was writing at the same time, it is possible that we read
+ * partially updated contents on some systems. If the CRC doesn't match,
+ * retry a limited number of times until we compute the same bad CRC twice
+ * in a row with a short sleep in between. Then the failure is unlikely
+ * to be due to a concurrent write.
+ */
+ if (!crc_ok &&
+ (retries == 0 || !EQ_CRC32C(crc, last_crc)) &&
+ retries < 10)
+ {
+ retries++;
+ last_crc = crc;
+ pg_usleep(10000);
+ goto retry;
+ }
+#endif
+
+ if (!crc_ok)
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("incorrect checksum in control file")));
--
2.43.0
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Incorrect checksum in control file with pg_rewind test
2025-09-04 15:18 Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
@ 2025-11-07 15:19 ` Maksim.Melnikov <[email protected]>
2025-11-14 13:15 ` Re: Incorrect checksum in control file with pg_rewind test Ivan Kovmir <[email protected]>
2026-04-21 12:12 ` Re: Incorrect checksum in control file with pg_rewind test Alexander Korotkov <[email protected]>
0 siblings, 2 replies; 6+ messages in thread
From: Maksim.Melnikov @ 2025-11-07 15:19 UTC (permalink / raw)
To: [email protected]
Hi,
just to clarify, it isn't pg_rewind related issue and can fire
spontaneously.
I don't have any strong scenario how to reproduce it, tests sometimes
fired on our local CI, but as you can see on thread [1],
where the same issue for frontends was discussed, it is very hard to
reproduce and there wasn't scenario how to do it too.
Some dirty hacks to reproduce it was described here [2], and I've tried
it on master branch:
First of all I applied patch
0001-XXX-Dirty-hack-to-clobber-control-file-for-testing.patch from [2],
then compile app with
-DEXEC_BACKEND and exec command in psql
do $$ begin loop perform pg_update_control_file(); end loop; end; $$;
Also I've run pgbench command
for run in {1..5000}; do pgbench -c50 -t100 -j6 -S postgres ; done
And eventually got error
2025-11-07 17:58:33.139 MSK [2472504] FATAL: incorrect checksum in
control file
2025-11-07 17:58:33.141 MSK [2472501] LOG: could not receive data from
client: Connection reset by peer
2025-11-07 17:58:33.143 MSK [2472505] LOG: could not send data to
client: Broken pipe
2025-11-07 17:58:33.143 MSK [2472505] FATAL: connection to client lost
Best regards,
Maksim Melnikov
[1]
https://www.postgresql.org/message-id/flat/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
[2]
https://www.postgresql.org/message-id/CA%2BhUKGK-BEe38aKNqHJDQ86LUW-CMwF5F9bo1JtJVg71FoDv_w%40mail.g...
[3]
https://www.postgresql.org/message-id/f59335a4-83ff-438a-a30e-7cf2200276b6%40postgrespro.ru
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Incorrect checksum in control file with pg_rewind test
2025-09-04 15:18 Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
2025-11-07 15:19 ` Re: Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
@ 2025-11-14 13:15 ` Ivan Kovmir <[email protected]>
1 sibling, 0 replies; 6+ messages in thread
From: Ivan Kovmir @ 2025-11-14 13:15 UTC (permalink / raw)
To: Maksim.Melnikov <[email protected]>; +Cc: [email protected]
I can reproduce the bug on the master branch with the following steps:
1. Apply 0001-XXX-Dirty-hack-to-clobber-control-file-for-testing.patch [1]
2. Compile PostgreSQL with `-DEXEC_BACKEND` C compiler flag option.
3. Run `initdb`
4. Run `postgres`
5. Run `pgbench -i`
6. Run `psql -c 'do $$ begin loop perform pg_update_control_file(); end
loop; end; $$;'`
7. Run `for run in {1..5000}; do pgbench -c50 -t100 -j6 -S postgres; done`
in parallel with the previous command.
8. Wait a while.
[1]
https://www.postgresql.org/message-id/CA%2BhUKGK-BEe38aKNqHJDQ86LUW-CMwF5F9bo1JtJVg71FoDv_w%40mail.g...
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Incorrect checksum in control file with pg_rewind test
2025-09-04 15:18 Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
2025-11-07 15:19 ` Re: Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
@ 2026-04-21 12:12 ` Alexander Korotkov <[email protected]>
2026-04-21 18:00 ` Re: Incorrect checksum in control file with pg_rewind test Alexander Lakhin <[email protected]>
2026-05-12 08:30 ` Re: Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
1 sibling, 2 replies; 6+ messages in thread
From: Alexander Korotkov @ 2026-04-21 12:12 UTC (permalink / raw)
To: Maksim.Melnikov <[email protected]>; +Cc: [email protected]
Hi, Maksim!
On Fri, Nov 7, 2025 at 5:19 PM Maksim.Melnikov
<[email protected]> wrote:
> just to clarify, it isn't pg_rewind related issue and can fire
> spontaneously.
> I don't have any strong scenario how to reproduce it, tests sometimes
> fired on our local CI, but as you can see on thread [1],
> where the same issue for frontends was discussed, it is very hard to
> reproduce and there wasn't scenario how to do it too.
>
> Some dirty hacks to reproduce it was described here [2], and I've tried
> it on master branch:
> First of all I applied patch
> 0001-XXX-Dirty-hack-to-clobber-control-file-for-testing.patch from [2],
> then compile app with
> -DEXEC_BACKEND and exec command in psql
> do $$ begin loop perform pg_update_control_file(); end loop; end; $$;
> Also I've run pgbench command
> for run in {1..5000}; do pgbench -c50 -t100 -j6 -S postgres ; done
> And eventually got error
>
> 2025-11-07 17:58:33.139 MSK [2472504] FATAL: incorrect checksum in
> control file
> 2025-11-07 17:58:33.141 MSK [2472501] LOG: could not receive data from
> client: Connection reset by peer
> 2025-11-07 17:58:33.143 MSK [2472505] LOG: could not send data to
> client: Broken pipe
> 2025-11-07 17:58:33.143 MSK [2472505] FATAL: connection to client lost
Thank you for spotting this issue and proposing a patch. The fork
builds don't have this problem, because fork replicated contents of
LocalControlFile to the new process. And the postmaster has
consistent snapshot of control file as there is no concurrent process
which could write it and that moment. But EXEC_BACKEND, even with
your patch, may end up different processes with different contents of
LocalControlFile. I don't see it could cause a material bug right
now, but I see this as undesirable divergence between fork and
EXEC_BACKEND behaviors. I propose an alternative approach copy the
contents of control file to the new process via BackendParameters.
This approach solves two problems at once: no torn reads, and no
divergence between fork and EXEC_BACKEND.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
[application/octet-stream] v1-0001-Inherit-pg_control-snapshot-into-EXEC_BACKEND-sub.patch (11.1K, 2-v1-0001-Inherit-pg_control-snapshot-into-EXEC_BACKEND-sub.patch)
download | inline diff:
From 78a5a6a977352b2d6de9851299db7e3ca47080cc Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <[email protected]>
Date: Tue, 21 Apr 2026 14:55:48 +0300
Subject: [PATCH v1] Inherit pg_control snapshot into EXEC_BACKEND
sub-processes
Under EXEC_BACKEND, each sub-process re-reads pg_control from disk in
SubPostmasterMain() via LocalProcessControlFile(). That read is not
serialized against update_controlfile(), which writes the file
non-atomically in the startup process or in the checkpointer. The
sub-process can therefore observe a torn image and fail the CRC check.
Also different sub-processes can end up with different versions of
LocalControlFile. It doesn't seems to cause material bug yet, but it would
be better to stick fork and EXEC_BACKEND builds to the same behavior.
This commit reproduces fork behavior on EXEC_BACKEND by passing
the postmaster's LocalControlFile copy through BackendParameters and
having the sub-process consume that snapshot instead of re-reading
from disk.
Now, LocalProcessControlFile() have a new an optional 'source' argument:
when NULL it reads from the disk as before; when non-NULL it installs the
provided snapshot and runs the same CRC, compatibility, and GUC
setup steps. GetLocalControlFileCopy() is a small accessor that
copies out the postmaster's LocalControlFile, so launch_backend.c
does not reach into xlog.c globals directly.
Discussion: https://www.postgresql.org/message-id/f59335a4-83ff-438a-a30e-7cf2200276b6%40postgrespro.ru
---
src/backend/access/transam/xlog.c | 89 +++++++++++++++++--------
src/backend/postmaster/launch_backend.c | 32 ++++++++-
src/backend/postmaster/postmaster.c | 4 +-
src/backend/tcop/postgres.c | 2 +-
src/include/access/xlog.h | 4 +-
5 files changed, 95 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f85b5286086..fa953f5a43a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -724,7 +724,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
static bool PerformRecoveryXLogAction(void);
static void InitControlFile(uint64 sysidentifier, uint32 data_checksum_version);
static void WriteControlFile(void);
-static void ReadControlFile(void);
+static void ReadControlFile(ControlFileData *source);
static void UpdateControlFile(void);
static char *str_time(pg_time_t tnow, char *buf, size_t bufsize);
@@ -4407,42 +4407,56 @@ WriteControlFile(void)
}
static void
-ReadControlFile(void)
+ReadControlFile(ControlFileData *source)
{
pg_crc32c crc;
- int fd;
char wal_segsz_str[20];
- int r;
/*
- * Read data...
+ * If a source snapshot was provided, use it instead of reading from disk.
+ * This path is used under EXEC_BACKEND to inherit pg_control from the
+ * postmaster via BackendParameters, matching fork semantics and avoiding
+ * torn reads of the on-disk file.
*/
- fd = BasicOpenFile(XLOG_CONTROL_FILE,
- O_RDWR | PG_BINARY);
- if (fd < 0)
- ereport(PANIC,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m",
- XLOG_CONTROL_FILE)));
-
- pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
- r = read(fd, ControlFile, sizeof(ControlFileData));
- if (r != sizeof(ControlFileData))
+ if (source)
+ {
+ memcpy(ControlFile, source, sizeof(ControlFileData));
+ }
+ else
{
- if (r < 0)
+ int fd;
+ int r;
+
+ /*
+ * Read data...
+ */
+ fd = BasicOpenFile(XLOG_CONTROL_FILE,
+ O_RDWR | PG_BINARY);
+ if (fd < 0)
ereport(PANIC,
(errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
+ errmsg("could not open file \"%s\": %m",
XLOG_CONTROL_FILE)));
- else
- ereport(PANIC,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("could not read file \"%s\": read %d of %zu",
- XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
- }
- pgstat_report_wait_end();
- close(fd);
+ pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
+ r = read(fd, ControlFile, sizeof(ControlFileData));
+ if (r != sizeof(ControlFileData))
+ {
+ if (r < 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ else
+ ereport(PANIC,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("could not read file \"%s\": read %d of %zu",
+ XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
+ }
+ pgstat_report_wait_end();
+
+ close(fd);
+ }
/*
* Check for expected pg_control format version. If this is wrong, the
@@ -4640,6 +4654,19 @@ UpdateControlFile(void)
update_controlfile(DataDir, ControlFile, true);
}
+/*
+ * Copy the postmaster's process-local pg_control image into *dest. This is
+ * the same in-memory copy that a forked backend inherits in its address
+ * space, and it is used under EXEC_BACKEND to hand the image to sub-processes
+ * via BackendParameters instead of having them re-read pg_control from disk.
+ */
+void
+GetLocalControlFileCopy(ControlFileData *dest)
+{
+ Assert(LocalControlFile != NULL);
+ memcpy(dest, LocalControlFile, sizeof(ControlFileData));
+}
+
/*
* Returns the unique system identifier from control file.
*/
@@ -5264,14 +5291,18 @@ show_effective_wal_level(void)
*
* reset just controls whether previous contents are to be expected (in the
* reset case, there's a dangling pointer into old shared memory), or not.
+ *
+ * If source is non-NULL, its contents are used instead of reading from disk.
+ * This supports EXEC_BACKEND sub-processes that inherit pg_control from the
+ * postmaster via BackendParameters.
*/
void
-LocalProcessControlFile(bool reset)
+LocalProcessControlFile(bool reset, struct ControlFileData *source)
{
Assert(reset || ControlFile == NULL);
LocalControlFile = palloc_object(ControlFileData);
ControlFile = LocalControlFile;
- ReadControlFile();
+ ReadControlFile(source);
SetLocalDataChecksumState(ControlFile->data_checksum_version);
}
@@ -5611,7 +5642,7 @@ BootStrapXLOG(uint32 data_checksum_version)
* Force control file to be read - in contrast to normal processing we'd
* otherwise never run the checks and GUC related initializations therein.
*/
- ReadControlFile();
+ ReadControlFile(NULL);
}
static char *
diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index 8f3cfea880c..75c6853dece 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -56,6 +56,7 @@
#include "utils/memutils.h"
#ifdef EXEC_BACKEND
+#include "catalog/pg_control.h"
#include "nodes/queryjumble.h"
#include "portability/instr_time.h"
#include "storage/pg_shmem.h"
@@ -141,6 +142,13 @@ typedef struct
ClientSocket client_sock;
InheritableSocket inh_sock;
+ /*
+ * Snapshot of pg_control taken by the postmaster. Inherited by the
+ * sub-process in place of re-reading the file from disk, to avoid torn
+ * reads and to match fork() inheritance semantics.
+ */
+ ControlFileData ControlFile;
+
/*
* Extra startup data, content depends on the child process.
*/
@@ -150,6 +158,13 @@ typedef struct
#define SizeOfBackendParameters(startup_data_len) (offsetof(BackendParameters, startup_data) + startup_data_len)
+/*
+ * Snapshot of pg_control inherited from the postmaster. Populated from
+ * BackendParameters in restore_backend_variables() and then consumed by the
+ * call to LocalProcessControlFile() in SubPostmasterMain().
+ */
+static ControlFileData InheritedControlFile;
+
static void read_backend_variables(char *id, void **startup_data, size_t *startup_data_len);
static void restore_backend_variables(BackendParameters *param);
@@ -662,10 +677,11 @@ SubPostmasterMain(int argc, char *argv[])
checkDataDir();
/*
- * (re-)read control file, as it contains config. The postmaster will
- * already have read this, but this process doesn't know about that.
+ * Set up control file from the snapshot the postmaster captured for us.
+ * Reading pg_control from disk here would race with concurrent updates
+ * from the startup process or checkpointer.
*/
- LocalProcessControlFile(false);
+ LocalProcessControlFile(false, &InheritedControlFile);
RegisterBuiltinShmemCallbacks();
@@ -772,6 +788,14 @@ save_backend_variables(BackendParameters *param,
strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
+ /*
+ * Hand the postmaster's process-local pg_control image to the
+ * sub-process. This mirrors what fork() inheritance does on platforms
+ * without EXEC_BACKEND, and it avoids re-reading pg_control from disk
+ * concurrently with the startup process or checkpointer updating it.
+ */
+ GetLocalControlFileCopy(¶m->ControlFile);
+
param->startup_data_len = startup_data_len;
if (startup_data_len > 0)
memcpy(param->startup_data, startup_data, startup_data_len);
@@ -1029,6 +1053,8 @@ restore_backend_variables(BackendParameters *param)
strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
+ memcpy(&InheritedControlFile, ¶m->ControlFile, sizeof(ControlFileData));
+
/*
* We need to restore fd.c's counts of externally-opened FDs; to avoid
* confusion, be sure to do this after restoring max_safe_fds. (Note:
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b6fd332f196..9c92ac09bd3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -916,7 +916,7 @@ PostmasterMain(int argc, char *argv[])
* processes will inherit the correct function pointer and not need to
* repeat the test.
*/
- LocalProcessControlFile(false);
+ LocalProcessControlFile(false, NULL);
/*
* Register the apply launcher. It's probably a good idea to call this
@@ -3265,7 +3265,7 @@ PostmasterStateMachine(void)
shmem_exit(1);
/* re-read control file into local memory */
- LocalProcessControlFile(true);
+ LocalProcessControlFile(true, NULL);
/*
* Re-initialize shared memory and semaphores. Note: We don't call
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2c1f14b7889..a35d99649e4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4175,7 +4175,7 @@ PostgresSingleUserMain(int argc, char *argv[],
CreateDataDirLockFile(false);
/* read control file (error checking and contains config ) */
- LocalProcessControlFile(false);
+ LocalProcessControlFile(false, NULL);
/* Register the shared memory needs of all core subsystems. */
RegisterBuiltinShmemCallbacks();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 437b4f32349..aab638e6a10 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -261,7 +261,9 @@ extern bool GetDefaultCharSignedness(void);
extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
extern void BootStrapXLOG(uint32 data_checksum_version);
extern void InitializeWalConsistencyChecking(void);
-extern void LocalProcessControlFile(bool reset);
+struct ControlFileData;
+extern void LocalProcessControlFile(bool reset, struct ControlFileData *source);
+extern void GetLocalControlFileCopy(struct ControlFileData *dest);
extern WalLevel GetActiveWalLevelOnStandby(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Incorrect checksum in control file with pg_rewind test
2025-09-04 15:18 Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
2025-11-07 15:19 ` Re: Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
2026-04-21 12:12 ` Re: Incorrect checksum in control file with pg_rewind test Alexander Korotkov <[email protected]>
@ 2026-04-21 18:00 ` Alexander Lakhin <[email protected]>
1 sibling, 0 replies; 6+ messages in thread
From: Alexander Lakhin @ 2026-04-21 18:00 UTC (permalink / raw)
To: Alexander Korotkov <[email protected]>; Maksim.Melnikov <[email protected]>; +Cc: [email protected]; Alvaro Herrera <[email protected]>
Hello Alexander and Maksim,
21.04.2026 15:12, Alexander Korotkov wrote:
> Thank you for spotting this issue and proposing a patch. ...
FWIW, this issue was also discussed at [1] (you can find the fix proposed
and even the commitfest entry there).
[1] https://www.postgresql.org/message-id/flat/202602021426.ztjk2wp6sgiy%40alvherre.pgsql
Best regards,
Alexander
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Incorrect checksum in control file with pg_rewind test
2025-09-04 15:18 Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
2025-11-07 15:19 ` Re: Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
2026-04-21 12:12 ` Re: Incorrect checksum in control file with pg_rewind test Alexander Korotkov <[email protected]>
@ 2026-05-12 08:30 ` Maksim.Melnikov <[email protected]>
1 sibling, 0 replies; 6+ messages in thread
From: Maksim.Melnikov @ 2026-05-12 08:30 UTC (permalink / raw)
To: Alexander Korotkov <[email protected]>; +Cc: [email protected]
On 4/21/26 15:12, Alexander Korotkov wrote:
> Thank you for spotting this issue and proposing a patch. The fork
> builds don't have this problem, because fork replicated contents of
> LocalControlFile to the new process. And the postmaster has
> consistent snapshot of control file as there is no concurrent process
> which could write it and that moment. But EXEC_BACKEND, even with
> your patch, may end up different processes with different contents of
> LocalControlFile. I don't see it could cause a material bug right
> now, but I see this as undesirable divergence between fork and
> EXEC_BACKEND behaviors. I propose an alternative approach copy the
> contents of control file to the new process via BackendParameters.
> This approach solves two problems at once: no torn reads, and no
> divergence between fork and EXEC_BACKEND.
Hi, Alexander!
Thanks for response! Your patch LGTM.
Best regards,
Maksim Melnikov
^ permalink raw reply [nested|flat] 6+ messages in thread
end of thread, other threads:[~2026-05-12 08:30 UTC | newest]
Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-04 15:18 Incorrect checksum in control file with pg_rewind test Maksim.Melnikov <[email protected]>
2025-11-07 15:19 ` Maksim.Melnikov <[email protected]>
2025-11-14 13:15 ` Ivan Kovmir <[email protected]>
2026-04-21 12:12 ` Alexander Korotkov <[email protected]>
2026-04-21 18:00 ` Alexander Lakhin <[email protected]>
2026-05-12 08:30 ` Maksim.Melnikov <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox