Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Anthonin Bonnefoy <[email protected]>
To: Michael Paquier <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
Date: Tue, 17 Mar 2026 17:51:40 +0100
Message-ID: <CAO6_XqqaaejpuoQ5y04n6tX6ZdsWLtEyvyV3fEm2A9tSaWL5qw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAO6_Xqq73TPa3M6nQ7RqRhKkcphy1JX7aNGTYy-x_Sn+6a8Z_Q@mail.gmail.com>
	<CAHGQGwGvnpN=2bo+F7H90YLFcx9=SazwLkcx+0gEcrbQy5NVZg@mail.gmail.com>
	<CAHGQGwECpyJtMqkCEvyqgZDiwAeMj3RKobui7jONrDd35W0x3Q@mail.gmail.com>
	<vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux>
	<CAO6_Xqq=-KnxrFQvmyF++XH4ngtXVhcDB979rEn6SPtUhSmNYg@mail.gmail.com>
	<CAHGQGwGpfEsBJBLQDou7xccnb0tj1T_NQv2gZfuQfgzn_b=RsQ@mail.gmail.com>
	<CAHGQGwFsuPcC7Aov73FTqMC5LEOdZ6uoWno5BR1HX2LnZRNKBw@mail.gmail.com>
	<CAO6_XqqvqMqUgZ0swSkB2VNUFos6b6Vbt9BpOEGCHz4nCQEi+g@mail.gmail.com>
	<CAHGQGwFxoU0O6BDPp1_7D9aYh--sp9csC2WtiUkMR36oLXRCuA@mail.gmail.com>
	<[email protected]>

On Tue, Mar 17, 2026 at 12:26 AM Michael Paquier <[email protected]> wrote:
> This stuff seems sensible enough that I think we should at least have
> a test, no?  It does not have to be absolutely perfect in terms of
> reproducibility, just good enough to be able to detect it across the
> buildfarm.  We already do various things with page boundaries in WAL
> during recovery, and a shutdown could be perhaps timed to increase the
> reproducibility rate of the issues discussed?

I initially thought that there was no easy way to trigger this issue
reliably in a test: the script I've been using won't work as soon as
there are changes in the record sizes. Then I remembered that
pg_logical_emit_message existed and could be used to write a WAL
record of a specific size, without allocating a xid and without
flushing the record.

With this, the test can be simplified to:
SELECT pg_switch_wal();
BEGIN;
SELECT pg_logical_emit_message(false, '', repeat('a', 16265), false);
ROLLBACK;

Any change in WAL short header, long header or xl_logical_message
struct will "break" the test since the record won't be at the exact
end of the page boundary. This also assumes that we have an 8 byte
alignment. 32 bits machine will have the WAL record ends at 3FF0, so
not exactly the end, but that should be fine to test different
conditions.

A word of caution about this test: While running it on my machine,
I've managed to trigger some weird WAL corruption. The new segment
after the switch had 1 or 2 excessive bytes at the start of the
segment just before the xlog page magic, shifting the whole file. The
first time it happened, I thought I'd messed something up and added
the bytes myself while looking at the WAL with imhex. The second time,
I've only run the script, and the new segment had a 1.1MB size shortly
after, so I'm pretty sure I didn't do anything that could have
introduced those excessive bytes.

I'm still trying to understand the trigger conditions (some race
condition between the switch and the walwriter?), but if this test is
merged, it may trigger this WAL corruption issue on the buildfarm.

Regards,
Anthonin Bonnefoy


Attachments:

  [application/octet-stream] v1-0001-Add-test-shutting-down-walsender-with-unflushed-r.patch (2.9K, 2-v1-0001-Add-test-shutting-down-walsender-with-unflushed-r.patch)
  download | inline diff:
From 97b9ba4374b5423f40b3868d83b515d94969dfa6 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 17 Mar 2026 15:07:55 +0100
Subject: Add test shutting down walsender with unflushed record

6eedb2a5fd8 fixed an issue where the walsender was stuck in a busy loop,
trying to read an unflused record.

d927b4bd97 fixed another issue introduced by the previous commit, where
XLogFlush would be called past the end of the generated WAL, generating
an error log.

This commit adds a test to cover those two fixes. By writing a logical
message of a specific size, we can reach a state where the last record
is crossing the page boundary, and ends at the end of the next page,
recreating the conditions for the aforementioned issues.
---
 src/test/recovery/t/006_logical_decoding.pl | 50 ++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index 97d11f98b59..1efb6f7b527 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -275,7 +275,55 @@ is( $node_primary->safe_psql(
 	qq(Check that reset timestamp is later after resetting stats for slot '$stats_test_slot1' again.)
 );
 
-# done with the node
+SKIP:
+{
+
+	# some Windows Perls at least don't like IPC::Run's start/kill_kill regime.
+	skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32';
+
+	# Test stopping the primary with an active walsender and an unflushed record
+	# which crosses the page boundary and ends at the end of the next page.
+	#
+	# First, start pg_recvlogical to have an active walsender
+	my $pg_recvlogical = IPC::Run::start(
+		[
+			'pg_recvlogical',
+			'--dbname' => $node_primary->connstr('postgres'),
+			'--slot' => 'test_slot',
+			'--file' => '-',
+			'--start'
+		]);
+
+	# Then, we write a logical message WAL record which finishes at the end of a
+	# WAL page, using a rollback so the WAL record isn't flushed.
+	#
+	# The size of a WAL logical message record is 55 bytes + message length
+	# Starting from a fresh WAL segment, we have:
+	#   - 8152 bytes available in the first page (long header)
+	#   - 8168 bytes available in the second page (short header)
+	# We need to write 16320 bytes of logical message WAL record, which can be done
+	# using a 16265 bytes long message.
+	$node_primary->safe_psql('postgres',
+		qq[
+		SELECT pg_switch_wal();
+		BEGIN;
+		SELECT pg_logical_emit_message(false, '', repeat('a', 16265), false);
+		ROLLBACK;
+		]
+	);
+
+	# try to restart
+	$node_primary->restart;
+	$pg_recvlogical->kill_kill;
+
+	my $logfile = slurp_file($node_primary->logfile());
+	unlike(
+		$logfile,
+		qr/request to flush past end of generated WAL/,
+		"There's no flush request past end of generated WAL");
+}
+
+# stop the node
 $node_primary->stop;
 
 done_testing();
-- 
2.53.0

view thread (17+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
  In-Reply-To: <CAO6_XqqaaejpuoQ5y04n6tX6ZdsWLtEyvyV3fEm2A9tSaWL5qw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox