public inbox for [email protected]help / color / mirror / Atom feed
[PATCH] Fix WAIT FOR LSN standby_write/standby_flush for archive recovery cases 3+ messages / 2 participants [nested] [flat]
* [PATCH] Fix WAIT FOR LSN standby_write/standby_flush for archive recovery cases @ 2026-04-15 06:44 SATYANARAYANA NARLAPURAM <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: SATYANARAYANA NARLAPURAM @ 2026-04-15 06:44 UTC (permalink / raw) To: PostgreSQL Hackers <[email protected]>; Alexander Korotkov <[email protected]> Hi Alexnader, Hackers, GetCurrentLSNForWaitType() for WAIT_LSN_TYPE_STANDBY_WRITE and WAIT_LSN_TYPE_STANDBY_FLUSH previously relied on the WAL receiver's tracked write/flush positions (GetWalRcvWriteRecPtr/GetWalRcvFlushRecPtr). There are two scenarios where WAIT FOR LSN queries can be stalled though replay is making progress. Breaking it down to two to give clarity on setups but the underlying problem is the same. There are two scenarios here: (1). When the standby is disconnected from the primary and switched to WAL archive mode, it continues to be in that mode until no more WAL is available to replay and then switch to streaming mode. Until then WAIT FOR LSN calls get stuck on the standby though replay catches up beyond the stale WAL receiver position. Switching XLog source from archive to streaming is separately tracked in [1]. (2). In the case of Archive recovery, no WAL receiver process exists, so these functions return InvalidXLogRecPtr (0/0). WAIT FOR LSN with standby_flush or standby_write modes would always time out, even for WAL that has been fully replayed. Fix by falling back to the replay LSN (GetXLogReplayRecPtr) when the WAL receiver position is invalid or behind replay. This is correct because any WAL that has been replayed has necessarily already been written and flushed to disk. Attached the repro test case. [1]: https://www.postgresql.org/message-id/[email protected]... Thanks, Satya Attachments: [application/octet-stream] 0001-Fix-WAIT-FOR-LSN-standby_write-standby_flush-for-arc.patch (2.8K, 3-0001-Fix-WAIT-FOR-LSN-standby_write-standby_flush-for-arc.patch) download | inline diff: From ba79f259177b137b969660f0311e7bef37efc1cf Mon Sep 17 00:00:00 2001 From: Satya Narlapuram <[email protected]> Date: Wed, 15 Apr 2026 06:24:35 +0000 Subject: [PATCH] Fix WAIT FOR LSN standby_write/standby_flush for archive recovery GetCurrentLSNForWaitType() for WAIT_LSN_TYPE_STANDBY_WRITE and WAIT_LSN_TYPE_STANDBY_FLUSH previously relied solely on the WAL receiver's tracked write/flush positions (GetWalRcvWriteRecPtr/GetWalRcvFlushRecPtr). In case of Archive recovery, no WAL receiver process exists, so these functions return InvalidXLogRecPtr (0/0). WAIT FOR LSN with standby_flush or standby_write modes would always time out, even for WAL that has been fully replayed. Fix by falling back to the replay LSN (GetXLogReplayRecPtr) when the WAL receiver position is invalid or behind replay. This is correct because any WAL that has been replayed has necessarily already been written and flushed to disk. --- src/backend/access/transam/xlogwait.c | 30 ++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c index 2e31c0d6..d7844e8d 100644 --- a/src/backend/access/transam/xlogwait.c +++ b/src/backend/access/transam/xlogwait.c @@ -93,6 +93,13 @@ StaticAssertDecl(lengthof(WaitLSNWaitEvents) == WAIT_LSN_TYPE_COUNT, /* * Get the current LSN for the specified wait type. + * + * For standby write and flush types, we first consult the WAL receiver's + * tracked position. However, during archive recovery (or after pg_rewind + * when the WAL receiver hasn't caught up), the WAL receiver may report 0/0 + * or a position behind the replay LSN. In those cases, we fall back to the + * replay LSN, because any WAL that has been replayed has necessarily already + * been written and flushed to disk. */ XLogRecPtr GetCurrentLSNForWaitType(WaitLSNType lsnType) @@ -105,10 +112,27 @@ GetCurrentLSNForWaitType(WaitLSNType lsnType) return GetXLogReplayRecPtr(NULL); case WAIT_LSN_TYPE_STANDBY_WRITE: - return GetWalRcvWriteRecPtr(); - case WAIT_LSN_TYPE_STANDBY_FLUSH: - return GetWalRcvFlushRecPtr(NULL, NULL); + { + XLogRecPtr walrcvPtr; + XLogRecPtr replayPtr; + + if (lsnType == WAIT_LSN_TYPE_STANDBY_WRITE) + walrcvPtr = GetWalRcvWriteRecPtr(); + else + walrcvPtr = GetWalRcvFlushRecPtr(NULL, NULL); + + replayPtr = GetXLogReplayRecPtr(NULL); + + /* + * Use the WAL receiver position if it is valid and ahead of + * the replay position. Otherwise, fall back to the replay + * LSN. This handles archive recovery (no WAL receiver). + */ + if (walrcvPtr != InvalidXLogRecPtr && walrcvPtr > replayPtr) + return walrcvPtr; + return replayPtr; + } case WAIT_LSN_TYPE_PRIMARY_FLUSH: return GetFlushRecPtr(NULL); -- 2.43.0 [application/octet-stream] 0001-Add-TAP-test-for-WAIT-FOR-LSN-during-archive-recover.patch (7.4K, 4-0001-Add-TAP-test-for-WAIT-FOR-LSN-during-archive-recover.patch) download | inline diff: From 79a273913cd0d88e7b980ef198e01c465532b08b Mon Sep 17 00:00:00 2001 From: Satya Narlapuram <[email protected]> Date: Wed, 15 Apr 2026 06:29:12 +0000 Subject: [PATCH] Add TAP test for WAIT FOR LSN during archive recovery Add a dedicated test (050_wait_for_lsn_archive.pl) that verifies WAIT FOR LSN standby_write, standby_flush, and standby_replay modes work correctly on an archive-only standby (no WAL receiver / no streaming replication). Tests cover: - All three standby modes return success for already-replayed LSNs - Data visibility after WAIT FOR on archive standby - Correct timeout for unreachable LSNs - Incrementally archived WAL (not just initial recovery) - primary_flush correctly errors on standby - Confirms no WAL receiver is present throughout the test --- .../recovery/t/050_wait_for_lsn_archive.pl | 169 ++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100644 src/test/recovery/t/050_wait_for_lsn_archive.pl diff --git a/src/test/recovery/t/050_wait_for_lsn_archive.pl b/src/test/recovery/t/050_wait_for_lsn_archive.pl new file mode 100644 index 00000000..050fce64 --- /dev/null +++ b/src/test/recovery/t/050_wait_for_lsn_archive.pl @@ -0,0 +1,169 @@ +# Checks that WAIT FOR LSN works correctly during archive recovery, +# where no WAL receiver process exists. Without the fix in +# GetCurrentLSNForWaitType(), standby_write and standby_flush modes +# would always time out because GetWalRcvWriteRecPtr()/ +# GetWalRcvFlushRecPtr() return InvalidXLogRecPtr (0/0) when there +# is no WAL receiver. +use strict; +use warnings FATAL => 'all'; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; +use File::Path qw(make_path); + +# Initialize primary node with archiving enabled +my $node_primary = PostgreSQL::Test::Cluster->new('primary'); +$node_primary->init(has_archiving => 1, allows_streaming => 1); +$node_primary->start; + +# Create test table and insert initial data +$node_primary->safe_psql('postgres', + "CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a"); + +# Take a backup for the archive standby +my $backup_name = 'archive_backup'; +$node_primary->backup($backup_name); + +# Create an archive-only standby (no streaming replication). +# We explicitly disable primary_conninfo so that no WAL receiver +# is started; WAL is obtained solely via restore_command. +my $node_standby = PostgreSQL::Test::Cluster->new('standby'); +$node_standby->init_from_backup($node_primary, $backup_name, + has_restoring => 1); +$node_standby->start; + +# Confirm that the standby is in recovery and has no WAL receiver +my $in_recovery = + $node_standby->safe_psql('postgres', "SELECT pg_is_in_recovery()"); +is($in_recovery, 't', "archive standby is in recovery"); + +my $walrcv_count = $node_standby->safe_psql('postgres', + "SELECT count(*) FROM pg_stat_wal_receiver"); +is($walrcv_count, '0', "archive standby has no WAL receiver"); + +# Insert data on primary and switch WAL to force archiving +$node_primary->safe_psql('postgres', + "INSERT INTO wait_test VALUES (generate_series(11, 20))"); +my $lsn1 = + $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()"); +$node_primary->safe_psql('postgres', "SELECT pg_switch_wal()"); + +# Wait for archive standby to replay up to the target LSN +$node_standby->poll_query_until('postgres', + "SELECT pg_last_wal_replay_lsn() >= '${lsn1}'::pg_lsn"); + +# 1. Test standby_replay mode - should succeed (this always worked) +my $output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${lsn1}' WITH (MODE 'standby_replay', TIMEOUT '5s', NO_THROW); +]); +is($output, 'success', + "archive standby: standby_replay returns success for replayed LSN"); + +# 2. Test standby_flush mode - this was broken before the fix +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${lsn1}' WITH (MODE 'standby_flush', TIMEOUT '5s', NO_THROW); +]); +is($output, 'success', + "archive standby: standby_flush returns success for replayed LSN"); + +# 3. Test standby_write mode - this was broken before the fix +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${lsn1}' WITH (MODE 'standby_write', TIMEOUT '5s', NO_THROW); +]); +is($output, 'success', + "archive standby: standby_write returns success for replayed LSN"); + +# 4. Verify data visibility after WAIT FOR on the archive standby +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${lsn1}' WITH (MODE 'standby_flush', TIMEOUT '5s'); + SELECT count(*) FROM wait_test; +]); +is((split("\n", $output))[-1], + '20', "archive standby: data is visible after WAIT FOR standby_flush"); + +# 5. Test that an unreachable LSN still times out correctly +my $unreachable_lsn = + $node_primary->safe_psql('postgres', + "SELECT pg_current_wal_insert_lsn() + 10000000000"); + +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${unreachable_lsn}' WITH (MODE 'standby_flush', TIMEOUT '100ms', NO_THROW); +]); +is($output, 'timeout', + "archive standby: standby_flush correctly times out for unreachable LSN"); + +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${unreachable_lsn}' WITH (MODE 'standby_write', TIMEOUT '100ms', NO_THROW); +]); +is($output, 'timeout', + "archive standby: standby_write correctly times out for unreachable LSN"); + +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${unreachable_lsn}' WITH (MODE 'standby_replay', TIMEOUT '100ms', NO_THROW); +]); +is($output, 'timeout', + "archive standby: standby_replay correctly times out for unreachable LSN"); + +# 6. Test with a second batch of WAL - verify the fix works for +# incrementally archived and restored WAL, not just initial recovery. +$node_primary->safe_psql('postgres', + "INSERT INTO wait_test VALUES (generate_series(21, 30))"); +my $lsn2 = + $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()"); +$node_primary->safe_psql('postgres', "SELECT pg_switch_wal()"); + +# Wait for archive standby to catch up +$node_standby->poll_query_until('postgres', + "SELECT pg_last_wal_replay_lsn() >= '${lsn2}'::pg_lsn"); + +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${lsn2}' WITH (MODE 'standby_flush', TIMEOUT '5s', NO_THROW); +]); +is($output, 'success', + "archive standby: standby_flush works for incrementally archived WAL"); + +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${lsn2}' WITH (MODE 'standby_write', TIMEOUT '5s', NO_THROW); +]); +is($output, 'success', + "archive standby: standby_write works for incrementally archived WAL"); + +$output = $node_standby->safe_psql( + 'postgres', qq[ + WAIT FOR LSN '${lsn2}' WITH (MODE 'standby_replay', TIMEOUT '5s'); + SELECT count(*) FROM wait_test; +]); +is((split("\n", $output))[-1], + '30', + "archive standby: all 30 rows visible after incremental archive recovery" +); + +# 7. Test that primary_flush correctly errors on the archive standby +my $stderr; +$node_standby->psql( + 'postgres', + "WAIT FOR LSN '${lsn2}' WITH (MODE 'primary_flush');", + stderr => \$stderr); +ok($stderr =~ /recovery is in progress/, + "archive standby: primary_flush correctly errors on standby"); + +# 8. Verify no WAL receiver appeared during the tests +$walrcv_count = $node_standby->safe_psql('postgres', + "SELECT count(*) FROM pg_stat_wal_receiver"); +is($walrcv_count, '0', + "archive standby: still no WAL receiver after all tests"); + +$node_standby->stop; +$node_primary->stop; + +done_testing(); -- 2.43.0 ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: [PATCH] Fix WAIT FOR LSN standby_write/standby_flush for archive recovery cases @ 2026-04-16 07:31 Alexander Korotkov <[email protected]> parent: SATYANARAYANA NARLAPURAM <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Alexander Korotkov @ 2026-04-16 07:31 UTC (permalink / raw) To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> Hi, Satyanarayana! On Wed, Apr 15, 2026 at 9:44 AM SATYANARAYANA NARLAPURAM <[email protected]> wrote: > GetCurrentLSNForWaitType() for WAIT_LSN_TYPE_STANDBY_WRITE and > WAIT_LSN_TYPE_STANDBY_FLUSH previously relied on the WAL receiver's > tracked write/flush positions (GetWalRcvWriteRecPtr/GetWalRcvFlushRecPtr). > There are two scenarios where WAIT FOR LSN queries can be stalled though > replay is making progress. Breaking it down to two to give clarity on setups but > the underlying problem is the same. > > There are two scenarios here: > > (1). When the standby is disconnected from the primary and switched to WAL > archive mode, it continues to be in that mode until no more WAL is available to replay > and then switch to streaming mode. Until then WAIT FOR LSN calls get stuck on the > standby though replay catches up beyond the stale WAL receiver position. Switching > XLog source from archive to streaming is separately tracked in [1]. > > (2). In the case of Archive recovery, no WAL receiver process exists, so these > functions return InvalidXLogRecPtr (0/0). WAIT FOR LSN with standby_flush or > standby_write modes would always time out, even for WAL that has been > fully replayed. > > Fix by falling back to the replay LSN (GetXLogReplayRecPtr) when the WAL > receiver position is invalid or behind replay. This is correct because any > WAL that has been replayed has necessarily already been written and flushed > to disk. Attached the repro test case. Please, check, similar patch is already posted here. https://www.postgresql.org/message-id/CABPTF7Ukk8iJF7TpnK2mFOaboNJgWL1csfXu4e3J4GT0o7x0GQ%40mail.gma... ------ Regards, Alexander Korotkov Supabase ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: [PATCH] Fix WAIT FOR LSN standby_write/standby_flush for archive recovery cases @ 2026-04-16 07:33 SATYANARAYANA NARLAPURAM <[email protected]> parent: Alexander Korotkov <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: SATYANARAYANA NARLAPURAM @ 2026-04-16 07:33 UTC (permalink / raw) To: Alexander Korotkov <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> Hi, On Thu, Apr 16, 2026 at 12:31 AM Alexander Korotkov <[email protected]> wrote: > Hi, Satyanarayana! > > On Wed, Apr 15, 2026 at 9:44 AM SATYANARAYANA NARLAPURAM > <[email protected]> wrote: > > GetCurrentLSNForWaitType() for WAIT_LSN_TYPE_STANDBY_WRITE and > > WAIT_LSN_TYPE_STANDBY_FLUSH previously relied on the WAL receiver's > > tracked write/flush positions > (GetWalRcvWriteRecPtr/GetWalRcvFlushRecPtr). > > There are two scenarios where WAIT FOR LSN queries can be stalled though > > replay is making progress. Breaking it down to two to give clarity on > setups but > > the underlying problem is the same. > > > > There are two scenarios here: > > > > (1). When the standby is disconnected from the primary and switched to > WAL > > archive mode, it continues to be in that mode until no more WAL is > available to replay > > and then switch to streaming mode. Until then WAIT FOR LSN calls get > stuck on the > > standby though replay catches up beyond the stale WAL receiver position. > Switching > > XLog source from archive to streaming is separately tracked in [1]. > > > > (2). In the case of Archive recovery, no WAL receiver process exists, so > these > > functions return InvalidXLogRecPtr (0/0). WAIT FOR LSN with > standby_flush or > > standby_write modes would always time out, even for WAL that has been > > fully replayed. > > > > Fix by falling back to the replay LSN (GetXLogReplayRecPtr) when the WAL > > receiver position is invalid or behind replay. This is correct because > any > > WAL that has been replayed has necessarily already been written and > flushed > > to disk. Attached the repro test case. > > Please, check, similar patch is already posted here. > > > https://www.postgresql.org/message-id/CABPTF7Ukk8iJF7TpnK2mFOaboNJgWL1csfXu4e3J4GT0o7x0GQ%40mail.gma... Thanks, I will review it. ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2026-04-16 07:33 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-04-15 06:44 [PATCH] Fix WAIT FOR LSN standby_write/standby_flush for archive recovery cases SATYANARAYANA NARLAPURAM <[email protected]> 2026-04-16 07:31 ` Alexander Korotkov <[email protected]> 2026-04-16 07:33 ` SATYANARAYANA NARLAPURAM <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox