public inbox for [email protected]
help / color / mirror / Atom feedrestore_command on high-throughput cluster never switches to streaming replication
2+ messages / 1 participants
[nested] [flat]
* restore_command on high-throughput cluster never switches to streaming replication
@ 2025-11-24 13:46 Kasper Føns <[email protected]>
2025-12-01 09:49 ` Fwd: restore_command on high-throughput cluster never switches to streaming replication Kasper Føns <[email protected]>
0 siblings, 1 reply; 2+ messages in thread
From: Kasper Føns @ 2025-11-24 13:46 UTC (permalink / raw)
To: [email protected]
Hi PostgreSQL community.
I debugged an instance where a PostgreSQL standby would not switch to
streaming replication when the `restore_command` fails.
*Expectation*
I expect PostgreSQL to try switching to streaming replication if the
`restore_command` fails.
*What happens*
PostgreSQL attempts to restore the previously restored WAL segment and then
retries the failed segment. However, because the primary produces WAL at a
high rate, the WAL file now exists and PostgreSQL does not try to switch to
streaming replication.
*Context*
Running PostgreSQL 15.7 in Kubernetes using CloudNative PostgreSQL Operator.
*Logs*
I configured PostgreSQL to emit DEBUG3 level logs. Newest logs first,
oldest last.
got WAL segment from archive
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000058*
pg_wal/RECOVERYXLOG"
got WAL segment from archive
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000057*
pg_wal/RECOVERYXLOG"
could not open file "pg_wal/*000000410000A7BA00000058*": No such file or
directory
could not restore file "*000000410000A7BA00000058*" from archive: child
process exited with exit code 1
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000058*
pg_wal/RECOVERYXLOG"
got WAL segment from archive
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000057*
pg_wal/RECOVERYXLOG"
Notice that when *000000410000A7BA00000058* failed, PostgreSQL asked for
*000000410000A7BA00000057* which it had already restored. Aftwards, it asks
about *000000410000A7BA00000058* once again.
*Problem*
This is problematic because the standby will never switch to streaming
replication.
*Workaround*
We can get the PostgreSQL replica to become in-sync if we change the
command to `/bin/false` when we are withing `wal_keep_size`.
*Question*
Is this the expected behaviour?
I expect the function `WaitForWALToBecomeAvailable` to switch to streaming
replication once a single `restore_command` fails. This also happens when
`/bin/false` is used instead.
Any help would be greatly appreciated
/Kasper Føns
^ permalink raw reply [nested|flat] 2+ messages in thread
* Fwd: restore_command on high-throughput cluster never switches to streaming replication
2025-11-24 13:46 restore_command on high-throughput cluster never switches to streaming replication Kasper Føns <[email protected]>
@ 2025-12-01 09:49 ` Kasper Føns <[email protected]>
0 siblings, 0 replies; 2+ messages in thread
From: Kasper Føns @ 2025-12-01 09:49 UTC (permalink / raw)
To: [email protected]
Hi PostgreSQL community.
I debugged an instance where a PostgreSQL standby would not switch to
streaming replication when the `restore_command` fails.
I first posted this to pgsql-admin mailing list, but now trying here as I
got no response.
*Expectation*
I expect PostgreSQL to try switching to streaming replication if the
`restore_command` fails.
*What happens*
PostgreSQL attempts to restore the previously restored WAL segment and then
retries the failed segment. However, because the primary produces WAL at a
high rate, the WAL file now exists and PostgreSQL does not try to switch to
streaming replication.
*Context*
Running PostgreSQL 15.7 in Kubernetes using CloudNative PostgreSQL Operator.
*Logs*
I configured PostgreSQL to emit DEBUG3 level logs. Newest logs first,
oldest last.
got WAL segment from archive
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000058*
pg_wal/RECOVERYXLOG"
got WAL segment from archive
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000057*
pg_wal/RECOVERYXLOG"
could not open file "pg_wal/*000000410000A7BA00000058*": No such file or
directory
could not restore file "*000000410000A7BA00000058*" from archive: child
process exited with exit code 1
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000058*
pg_wal/RECOVERYXLOG"
got WAL segment from archive
executing restore command "/controller/manager wal-restore
--log-destination /controller/log/postgres.json *000000410000A7BA00000057*
pg_wal/RECOVERYXLOG"
Notice that when *000000410000A7BA00000058* failed, PostgreSQL asked for
*000000410000A7BA00000057* which it had already restored. Aftwards, it asks
about *000000410000A7BA00000058* once again.
*Problem*
This is problematic because the standby will never switch to streaming
replication.
*Workaround*
We can get the PostgreSQL replica to become in-sync if we change the
command to `/bin/false` when we are withing `wal_keep_size`.
*Question*
Is this the expected behaviour?
I expect the function `WaitForWALToBecomeAvailable` to switch to streaming
replication once a single `restore_command` fails. This also happens when
`/bin/false` is used instead.
Any help would be greatly appreciated
/Kasper Føns
^ permalink raw reply [nested|flat] 2+ messages in thread
end of thread, other threads:[~2025-12-01 09:49 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-11-24 13:46 restore_command on high-throughput cluster never switches to streaming replication Kasper Føns <[email protected]>
2025-12-01 09:49 ` Kasper Føns <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox