public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
17+ messages / 4 participants
[nested] [flat]

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-02-26 16:19  Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-02-26 16:19 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: pgsql-hackers

> Shutdown may be indefinitely stuck under the following circumstances:

Thanks for reporting this issue and for the patch!

I was able to reproduce the problem on master.


On Tue, Feb 24, 2026 at 6:47 PM Anthonin Bonnefoy
<[email protected]> wrote:
> So, it looks like the root issue is more that the async LSN isn't
> updated when a transaction without xid is rollbacked.
> When going through CommitTransaction, such a transaction would still
> go through XLogSetAsyncXactLSN.
>
> I've updated the patch with this new approach: XLogSetAsyncXactLSN is
> now called in RecordTransactionAbort even when a xid wasn't assigned.
> With this, the logical walsender is able to force the flush of the
> last partial page using XLogBackgroundFlush.

I'm a bit concerned that this approach could cause walwriter to attempt
WAL writes more aggressively in some cases, adding overhead during
normal processing. If the goal is only to prevent shutdown from getting stuck,
instead, would it be sufficient for ShutdownXLOG() to call
XLogSetAsyncXactLSN() just before WalSndWaitStopping()?

Regards,

-- 
Fujii Masao






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-02 10:38  Anthonin Bonnefoy <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Anthonin Bonnefoy @ 2026-03-02 10:38 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: pgsql-hackers

On Thu, Feb 26, 2026 at 5:20 PM Fujii Masao <[email protected]> wrote:
> Thanks for reporting this issue and for the patch!
>
> I was able to reproduce the problem on master.

Thanks for the confirmation!

> I'm a bit concerned that this approach could cause walwriter to attempt
> WAL writes more aggressively in some cases, adding overhead during
> normal processing. If the goal is only to prevent shutdown from getting stuck,
> instead, would it be sufficient for ShutdownXLOG() to call
> XLogSetAsyncXactLSN() just before WalSndWaitStopping()?

I did something similar with the first version using
XLogFlush(WriteRqstPtr), and it indeed fixes the shutdown issue.

Though I was under the impression that calling XLogSetAsyncXactLSN()
in RecordTransactionAbort was a better approach, as it is similar to
reporting the latest async abort. You may have a large backlog of
records, like selects pruning a lot of pages and timing out, which the
next commit will have to flush. Notifying the walwriter allows it to
flush those records it can.

Maybe for the context of a backport, patching ShutdownXLOG has the
benefit of minimising the amount of changes and risks?

I've updated the RecordTransactionAbort approach for now, with a small
optimization. XactLastRecEnd may be 0 if nothing was written, and we
can skip the unnecessary spinlocks in this case.

Regards,
Anthonin Bonnefoy


Attachments:

  [application/octet-stream] v3-0001-Fix-stuck-shutdown-due-to-unflushed-records.patch (2.7K, 2-v3-0001-Fix-stuck-shutdown-due-to-unflushed-records.patch)
  download | inline diff:
From 319254f19f9ddf1b7c93a58d9b0bab40b0a00d85 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 24 Feb 2026 09:24:48 +0100
Subject: Fix stuck shutdown due to unflushed records

Shutdown sequence may be stuck indefinitely under the following
circumstances:
- Data checksums is enabled
- A logical replication walsender is running
- A select in an explicit transaction tries to prune a full heap page,
  wrote a FPI_FOR_HINT record which crosses the page boundary
- The select is rollbacked (or killed)
- 'pg_ctl stop' is sent

The FPI_FOR_HINT record is likely going to be a contrecord and starts a
new page. However, as the select is rollbacked, XLogSetAsyncXactLSN
isn't called to advance the LSN to include this record.

When the checkpointer starts ShutdownXLOG(), all walsenders will be
notified to stop. However, the logical replication walsender will be
stuck in the following infinite loop:
- Tries to read the last FPI_FOR_HINT record
- The page with the record header is read
- tot_len > len, the record needs to be reassembled
- Tries to read the next page containing the rest of the record. It fails since this page was never written.
- xlog reader state is reset with XLogReaderInvalReadState
- It goes back to the start of WalSndLoop's loop

There are some attempts done by the walsender to flush the WAL using
XLogBackgroundFlush. However, XLogBackgroundFlush only writes completed
blocks, or up to the latest known async lsn.

Since the select was rollbacked, XLogBackgroundFlush doesn't flush the
next partial page.

This patch fixes the issue by advancing the async LSN, even when the
transaction doesn't have an assigned xid. This allows
XLogBackgroundFlush to write the necessary partial page when called by
the walsender.
---
 src/backend/access/transam/xact.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index eba4f063168..15729d9e2da 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1787,7 +1787,19 @@ RecordTransactionAbort(bool isSubXact)
 	{
 		/* Reset XactLastRecEnd until the next transaction writes something */
 		if (!isSubXact)
+		{
+			/*
+			 * Even if no xid was assigned, some records may have been written
+			 * in the WAL. Report the latest async LSN, so that the WAL writer
+			 * knows to flush those records. This is important when shutting
+			 * down, walsender may use XLogBackgroundFlush to trigger pending
+			 * WAL to be written out. If they're not tracked by async xact
+			 * lsn, they won't be written by XLogBackgroundFlush.
+			 */
+			if (XactLastRecEnd != 0)
+				XLogSetAsyncXactLSN(XactLastRecEnd);
 			XactLastRecEnd = 0;
+		}
 		return InvalidTransactionId;
 	}
 
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-03 17:11  Anthonin Bonnefoy <[email protected]>
  parent: Anthonin Bonnefoy <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Anthonin Bonnefoy @ 2026-03-03 17:11 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: pgsql-hackers

Here's a small updated version of the patch, with the 2 different approaches.

- 0001: This is the XLogSetAsyncXactLSN call in RecordTransactionAbort
approach. The small difference with v3 is that the 'XactLastRecEnd !=
0' condition is now merged with !isSubXact:
+if (!isSubXact && XactLastRecEnd != 0)
+{
+        XLogSetAsyncXactLSN(XactLastRecEnd);
         XactLastRecEnd = 0;
+}

- 0002: This is the ShutdownXLOG approach. I've used
XLogFlush(WriteRqstPtr) instead of updating the async LSN. It feels
like if we're going to stop the walsenders, we may as well flush
everything and get the WAL in a good state.
The spinlock to access XLogCtl->LogwrtRqst.Write is probably
unnecessary since we're at a point where no additional WAL records
should be written, but it doesn't hurt to keep consistency.

Regards,
Anthonin Bonnefoy


Attachments:

  [application/octet-stream] v4-0002-Fix-stuck-shutdown-due-to-unflushed-records.patch (2.7K, 2-v4-0002-Fix-stuck-shutdown-due-to-unflushed-records.patch)
  download | inline diff:
From 0db213c05c1fb5df84950159ad991059a40c3d71 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 3 Mar 2026 17:42:40 +0100
Subject: Fix stuck shutdown due to unflushed records

Shutdown sequence may be stuck indefinitely under the following
circumstances:
- Data checksums is enabled
- A logical replication walsender is running
- A select in an explicit transaction tries to prune a full heap page,
  wrote a FPI_FOR_HINT record which crosses the page boundary
- The select is rollbacked (or killed)
- 'pg_ctl stop' is sent

The FPI_FOR_HINT record is likely going to be a contrecord and starts a
new page. However, as the select is rollbacked, XLogSetAsyncXactLSN
isn't called to advance the LSN to include this record.

When the checkpointer starts ShutdownXLOG(), all walsenders will be
notified to stop. However, the logical replication walsender will be
stuck in the following infinite loop:
- Tries to read the last FPI_FOR_HINT record
- The page with the record header is read
- tot_len > len, the record needs to be reassembled
- Tries to read the next page containing the rest of the record. It fails since this page was never written.
- xlog reader state is reset with XLogReaderInvalReadState
- It goes back to the start of WalSndLoop's loop

There are some attempts done by the walsender to flush the WAL using
XLogBackgroundFlush. However, XLogBackgroundFlush only writes completed
blocks, or up to the latest known async lsn.

Since the select was rollbacked, XLogBackgroundFlush doesn't flush the
next partial page.

This patch fixes the issue by advancing flushing all records before
signaling the walsenders to stop, avoiding the case where the walsenders
read a partially written record.
---
 src/backend/access/transam/xlog.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 354ac645bdc..31afb249d5b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6710,6 +6710,8 @@ GetLastSegSwitchData(XLogRecPtr *lastSwitchLSN)
 void
 ShutdownXLOG(int code, Datum arg)
 {
+	XLogRecPtr	WriteRqstPtr;
+
 	/*
 	 * We should have an aux process resource owner to use, and we should not
 	 * be in a transaction that's installed some other resowner.
@@ -6723,6 +6725,15 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * We may have unflushed records, make sure everything is flushed before
+	 * stopping the walsenders.
+	 */
+	SpinLockAcquire(&XLogCtl->info_lck);
+	WriteRqstPtr = XLogCtl->LogwrtRqst.Write;
+	SpinLockRelease(&XLogCtl->info_lck);
+	XLogFlush(WriteRqstPtr);
+
 	/*
 	 * Signal walsenders to move to stopping state.
 	 */
-- 
2.52.0



  [application/octet-stream] v4-0001-Fix-stuck-shutdown-due-to-unflushed-records.patch (2.8K, 3-v4-0001-Fix-stuck-shutdown-due-to-unflushed-records.patch)
  download | inline diff:
From a2310f65695d9481354100a503c92b7cd8255a2d Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 24 Feb 2026 09:24:48 +0100
Subject: Fix stuck shutdown due to unflushed records

Shutdown sequence may be stuck indefinitely under the following
circumstances:
- Data checksums is enabled
- A logical replication walsender is running
- A select in an explicit transaction tries to prune a full heap page,
  wrote a FPI_FOR_HINT record which crosses the page boundary
- The select is rollbacked (or killed)
- 'pg_ctl stop' is sent

The FPI_FOR_HINT record is likely going to be a contrecord and starts a
new page. However, as the select is rollbacked, XLogSetAsyncXactLSN
isn't called to advance the LSN to include this record.

When the checkpointer starts ShutdownXLOG(), all walsenders will be
notified to stop. However, the logical replication walsender will be
stuck in the following infinite loop:
- Tries to read the last FPI_FOR_HINT record
- The page with the record header is read
- tot_len > len, the record needs to be reassembled
- Tries to read the next page containing the rest of the record. It fails since this page was never written.
- xlog reader state is reset with XLogReaderInvalReadState
- It goes back to the start of WalSndLoop's loop

There are some attempts done by the walsender to flush the WAL using
XLogBackgroundFlush. However, XLogBackgroundFlush only writes completed
blocks, or up to the latest known async lsn.

Since the select was rollbacked, XLogBackgroundFlush doesn't flush the
next partial page.

This patch fixes the issue by advancing the async LSN, even when the
transaction doesn't have an assigned xid. This allows
XLogBackgroundFlush to write the necessary partial page when called by
the walsender.
---
 src/backend/access/transam/xact.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index eba4f063168..1786b397769 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1786,8 +1786,19 @@ RecordTransactionAbort(bool isSubXact)
 	if (!TransactionIdIsValid(xid))
 	{
 		/* Reset XactLastRecEnd until the next transaction writes something */
-		if (!isSubXact)
+		if (!isSubXact && XactLastRecEnd != 0)
+		{
+			/*
+			 * Even if no xid was assigned, some records may have been written
+			 * in the WAL. Report the latest async LSN, so that the WAL writer
+			 * knows to flush those records. This is important when shutting
+			 * down, walsender may use XLogBackgroundFlush to trigger pending
+			 * WAL to be written out. If they're not tracked by async xact
+			 * lsn, they won't be written by XLogBackgroundFlush.
+			 */
+			XLogSetAsyncXactLSN(XactLastRecEnd);
 			XactLastRecEnd = 0;
+		}
 		return InvalidTransactionId;
 	}
 
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-03 17:29  Fujii Masao <[email protected]>
  parent: Anthonin Bonnefoy <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-03-03 17:29 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: pgsql-hackers

On Wed, Mar 4, 2026 at 2:11 AM Anthonin Bonnefoy
<[email protected]> wrote:
>
> Here's a small updated version of the patch, with the 2 different approaches.

Thanks for the patches!


> - 0001: This is the XLogSetAsyncXactLSN call in RecordTransactionAbort
> approach. The small difference with v3 is that the 'XactLastRecEnd !=
> 0' condition is now merged with !isSubXact:
> +if (!isSubXact && XactLastRecEnd != 0)
> +{
> +        XLogSetAsyncXactLSN(XactLastRecEnd);
>          XactLastRecEnd = 0;
> +}

The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
more like an improvement than a bug fix. Since it changes
RecordTransactionAbort(), it could have unintended impact on the system.

It may be a reasonable idea (though I'm not certain yet), but for a bug fix
I believe we should first apply the minimal change necessary to resolve
the issue. If needed, this approach could then be proposed later separately as
an improvement for the next major version.


> - 0002: This is the ShutdownXLOG approach. I've used
> XLogFlush(WriteRqstPtr) instead of updating the async LSN. It feels
> like if we're going to stop the walsenders, we may as well flush
> everything and get the WAL in a good state.
> The spinlock to access XLogCtl->LogwrtRqst.Write is probably
> unnecessary since we're at a point where no additional WAL records
> should be written, but it doesn't hurt to keep consistency.

As a simpler alternative, would it make sense for walsender to call
XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
If there isn't a clear reason for that choice, directly calling XLogFlush()
might be the simpler solution. Thought?

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-04 08:51  Anthonin Bonnefoy <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Anthonin Bonnefoy @ 2026-03-04 08:51 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: pgsql-hackers

On Tue, Mar 3, 2026 at 6:29 PM Fujii Masao <[email protected]> wrote:
> The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
> more like an improvement than a bug fix. Since it changes
> RecordTransactionAbort(), it could have unintended impact on the system.
>
> It may be a reasonable idea (though I'm not certain yet), but for a bug fix
> I believe we should first apply the minimal change necessary to resolve
> the issue. If needed, this approach could then be proposed later separately as
> an improvement for the next major version.

Agreed, that's definitely a change that can have a large impact. I
will open a separate thread later.

> As a simpler alternative, would it make sense for walsender to call
> XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
> shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
> If there isn't a clear reason for that choice, directly calling XLogFlush()
> might be the simpler solution. Thought?

That sounds like a good solution. I've tried it and it fixes the
issue. And this only changes the shutdown behaviour in the walsender.

The use of XLogBackgroundFlush() has been introduced with
c6c333436491, but there's no mention why it was specifically used. I
guess the assumption was that a change would either be flushed with a
commit, or tracked by async LSN through rollback, so
XLogBackgroundFlush() would always write pending records. But this
turns out to be false in the case of this bug.

I've updated the patch with this approach.

Regards,
Anthonin Bonnefoy


Attachments:

  [application/octet-stream] v5-0001-Fix-stuck-shutdown-due-to-unflushed-records.patch (2.3K, 2-v5-0001-Fix-stuck-shutdown-due-to-unflushed-records.patch)
  download | inline diff:
From 0e06c5fe451832276f26cb3a86111228e78525a4 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 3 Mar 2026 17:42:40 +0100
Subject: Fix stuck shutdown due to unflushed records

Shutdown sequence may be stuck indefinitely under the following
circumstances:
- Data checksums is enabled
- A logical replication walsender is running
- A select in an explicit transaction tries to prune a full heap page,
  wrote a FPI_FOR_HINT record which crosses the page boundary
- The select is rollbacked (or killed)
- 'pg_ctl stop' is sent

The FPI_FOR_HINT record is likely going to be a contrecord and starts a
new page. However, as the select is rollbacked, XLogSetAsyncXactLSN
isn't called to advance the LSN to include this record.

When the checkpointer starts ShutdownXLOG(), all walsenders will be
notified to stop. However, the logical replication walsender will be
stuck in the following infinite loop:
- Tries to read the last FPI_FOR_HINT record
- The page with the record header is read
- tot_len > len, the record needs to be reassembled
- Tries to read the next page containing the rest of the record. It fails since this page was never written.
- xlog reader state is reset with XLogReaderInvalReadState
- It goes back to the start of WalSndLoop's loop

There are some attempts done by the walsender to flush the WAL using
XLogBackgroundFlush. However, XLogBackgroundFlush only writes completed
blocks, or up to the latest known async lsn. Since the select was
rollbacked, XLogBackgroundFlush doesn't flush the next partial page.

This patch fixes the issue by replacing XLogBackgroundFlush() by
XLogFlush(GetXLogInsertRecPtr()), flushing all pending records without
depending on async LSN to be up to date.
---
 src/backend/replication/walsender.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2cde8ebc729..5a6a618678d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1886,7 +1886,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * written, because walwriter has shut down already.
 		 */
 		if (got_STOPPING)
-			XLogBackgroundFlush();
+			XLogFlush(GetXLogInsertRecPtr());
 
 		/*
 		 * To avoid the scenario where standbys need to catch up to a newer
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-04 17:46  Fujii Masao <[email protected]>
  parent: Anthonin Bonnefoy <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-03-04 17:46 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: pgsql-hackers

On Wed, Mar 4, 2026 at 5:52 PM Anthonin Bonnefoy
<[email protected]> wrote:
>
> On Tue, Mar 3, 2026 at 6:29 PM Fujii Masao <[email protected]> wrote:
> > The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
> > more like an improvement than a bug fix. Since it changes
> > RecordTransactionAbort(), it could have unintended impact on the system.
> >
> > It may be a reasonable idea (though I'm not certain yet), but for a bug fix
> > I believe we should first apply the minimal change necessary to resolve
> > the issue. If needed, this approach could then be proposed later separately as
> > an improvement for the next major version.
>
> Agreed, that's definitely a change that can have a large impact. I
> will open a separate thread later.
>
> > As a simpler alternative, would it make sense for walsender to call
> > XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
> > shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
> > If there isn't a clear reason for that choice, directly calling XLogFlush()
> > might be the simpler solution. Thought?
>
> That sounds like a good solution. I've tried it and it fixes the
> issue. And this only changes the shutdown behaviour in the walsender.
>
> The use of XLogBackgroundFlush() has been introduced with
> c6c333436491, but there's no mention why it was specifically used. I
> guess the assumption was that a change would either be flushed with a
> commit, or tracked by async LSN through rollback, so
> XLogBackgroundFlush() would always write pending records. But this
> turns out to be false in the case of this bug.
>
> I've updated the patch with this approach.

Thanks for updating the patch!

Barring any objections, I will commit the patch and backpatch it to
all supported branches.

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-05 08:40  Anthonin Bonnefoy <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Anthonin Bonnefoy @ 2026-03-05 08:40 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Fujii Masao <[email protected]>; pgsql-hackers

Hi Alexander,

On Thu, Mar 5, 2026 at 7:30 AM Alexander Lakhin <[email protected]> wrote:
> Hello Anthonin and Masao-san,
> Thank you for working on this!
>
> It looks like the same issue was discovered and discussed before, but that
> time without a final fix: [1]. I tried v5 patch with my
> 099_walsender_stop.pl test and it executed 100 internal iterations
> successfully, while without the patch it failed for me on iterations 6, 8, 5.
>
> [1] https://www.postgresql.org/message-id/flat/f15d665f-4cd1-4894-037c-afdbe369287e%40gmail.com

Thanks for the tests and the additional context.

Looking at the the thread, the latest patch provided a similar solution using:
+ XLogFlush(GetInsertRecPtr());

So it was relying on GetInsertRecPtr() instead of
GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
only returns the position of the last full xlog page, meaning it
doesn't fix the issue we have where the last partial page contains a
continuation record.

Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
get the shutdown stuck issue.

Using GetXLogInsertRecPtr() is required to make sure the last partial
page is correctly flushed.

Regards,
Anthonin Bonnefoy





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-05 23:46  Fujii Masao <[email protected]>
  parent: Anthonin Bonnefoy <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-03-05 23:46 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; pgsql-hackers

On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
<[email protected]> wrote:
> So it was relying on GetInsertRecPtr() instead of
> GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
> only returns the position of the last full xlog page, meaning it
> doesn't fix the issue we have where the last partial page contains a
> continuation record.
>
> Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
> get the shutdown stuck issue.
>
> Using GetXLogInsertRecPtr() is required to make sure the last partial
> page is correctly flushed.

Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
almost nothing during recovery, I added a !RecoveryInProgress() check
as follows. I've attached the latest version of the patch and updated
the commit message.

- if (got_STOPPING)
- XLogBackgroundFlush();
+ if (got_STOPPING && !RecoveryInProgress())
+ XLogFlush(GetXLogInsertRecPtr());

Regards,

-- 
Fujii Masao


Attachments:

  [application/octet-stream] v6-0001-Fix-publisher-shutdown-hang-caused-by-logical-wal.patch (2.0K, 2-v6-0001-Fix-publisher-shutdown-hang-caused-by-logical-wal.patch)
  download | inline diff:
From 1897c4b5979853b2fb4d679787ac9b633f183076 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 3 Mar 2026 17:42:40 +0100
Subject: [PATCH v6] Fix publisher shutdown hang caused by logical walsender
 busy loop.

Previously, when logical replication was running, shutting down
the publisher could cause the logical walsender to enter a busy loop
and prevent the publisher from completing shutdown.

During shutdown, the logical walsender waits for all pending WAL
to be written out. However, some WAL records could remain unflushed,
causing the walsender to wait indefinitely.

The issue occurred because the walsender used XLogBackgroundFlush() to
flush pending WAL. This function does not guarantee that all WAL is written.
For example, WAL generated by a transaction without an assigned
transaction ID that aborts might not be flushed.

This commit fixes the bug by making the logical walsender call XLogFlush()
instead, ensuring that all pending WAL is written and preventing
the busy loop during shutdown.

Backpatch to all supported versions.

Author: Anthonin Bonnefoy <[email protected]>
Reviewed-by: Alexander Law <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/CAO6_Xqo3co3BuUVEVzkaBVw9LidBgeeQ_2hfxeLMQcXwovB3GQ@mail.gmail.com
Backpatch-through: 14
---
 src/backend/replication/walsender.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2cde8ebc729..917d2a0c3f4 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1885,8 +1885,8 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * otherwise we'd possibly end up waiting for WAL that never gets
 		 * written, because walwriter has shut down already.
 		 */
-		if (got_STOPPING)
-			XLogBackgroundFlush();
+		if (got_STOPPING && !RecoveryInProgress())
+			XLogFlush(GetXLogInsertRecPtr());
 
 		/*
 		 * To avoid the scenario where standbys need to catch up to a newer
-- 
2.51.2



^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-06 07:48  Fujii Masao <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-03-06 07:48 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; pgsql-hackers

On Fri, Mar 6, 2026 at 8:46 AM Fujii Masao <[email protected]> wrote:
>
> On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
> <[email protected]> wrote:
> > So it was relying on GetInsertRecPtr() instead of
> > GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
> > only returns the position of the last full xlog page, meaning it
> > doesn't fix the issue we have where the last partial page contains a
> > continuation record.
> >
> > Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
> > get the shutdown stuck issue.
> >
> > Using GetXLogInsertRecPtr() is required to make sure the last partial
> > page is correctly flushed.
>
> Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
> almost nothing during recovery, I added a !RecoveryInProgress() check
> as follows. I've attached the latest version of the patch and updated
> the commit message.
>
> - if (got_STOPPING)
> - XLogBackgroundFlush();
> + if (got_STOPPING && !RecoveryInProgress())
> + XLogFlush(GetXLogInsertRecPtr());

I've pushed the patch. Thanks!

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-10 17:11  Andres Freund <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Andres Freund @ 2026-03-10 17:11 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Anthonin Bonnefoy <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

Hi,

On 2026-03-06 16:48:06 +0900, Fujii Masao wrote:
> On Fri, Mar 6, 2026 at 8:46 AM Fujii Masao <[email protected]> wrote:
> >
> > On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
> > <[email protected]> wrote:
> > > So it was relying on GetInsertRecPtr() instead of
> > > GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
> > > only returns the position of the last full xlog page, meaning it
> > > doesn't fix the issue we have where the last partial page contains a
> > > continuation record.
> > >
> > > Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
> > > get the shutdown stuck issue.
> > >
> > > Using GetXLogInsertRecPtr() is required to make sure the last partial
> > > page is correctly flushed.
> >
> > Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
> > almost nothing during recovery, I added a !RecoveryInProgress() check
> > as follows. I've attached the latest version of the patch and updated
> > the commit message.
> >
> > - if (got_STOPPING)
> > - XLogBackgroundFlush();
> > + if (got_STOPPING && !RecoveryInProgress())
> > + XLogFlush(GetXLogInsertRecPtr());
> 
> I've pushed the patch. Thanks!

I'm pretty sure this is not correct as-is, it suffers from the same issue as
https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx%40e4rvwavtjswo
I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
flush to, due to page boundaries.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-12 14:07  Anthonin Bonnefoy <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Anthonin Bonnefoy @ 2026-03-12 14:07 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Fujii Masao <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

On Tue, Mar 10, 2026 at 6:11 PM Andres Freund <[email protected]> wrote:
> I'm pretty sure this is not correct as-is, it suffers from the same issue as
> https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx%40e4rvwavtjswo
> I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
> flush to, due to page boundaries.

I've managed to reproduce this issue by ensuring the FPI_FOR_HINT
record finishes at the end of a page with the following script (might
need some adjustment if the record sizes are different):

DROP TABLE IF EXISTS test_insert_rec_ptr;
CREATE TABLE test_insert_rec_ptr(aid int, data text) WITH
(autovacuum_enabled = false);
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 100) FROM
generate_series(0, 57);
-- This should tag the page as full
BEGIN; UPDATE test_insert_rec_ptr SET aid=2 where aid=1; ROLLBACK;
CHECKPOINT;
-- Start with a fresh file
SELECT pg_switch_wal();
-- Our FPI_FOR_HINT writes 8193 bytes
-- With the long header,  the first  page has 8152 bytes available
-- With the short header, the second page has 8168 bytes available
-- We want our FPI_FOR_HINT to finish at the end of the second page
(+/- 8 bytes of alignment)
-- We need to write the first 25 bytes (or 32 with alignment) in the first page
-- For that, we need to write 8120 bytes of WAL records
BEGIN;
-- 264 bytes of FPW
INSERT INTO test_insert_rec_ptr VALUES(1);
-- 74 * 104 bytes
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 44) FROM
generate_series(1, 74);
-- 108 bytes
INSERT INTO test_insert_rec_ptr VALUES(1, repeat('a', 48));
-- 46 bytes
COMMIT;
-- 264 + 74 * 104 + 46 + 108 = 8114 bytes, which will round up to 8120
with alignment
-- FPI_FOR_HINT record should be at 0x1FE0
BEGIN; SELECT * FROM test_insert_rec_ptr WHERE aid=2; ROLLBACK;

As far as I can tell, the only impact it has is to complain about the
write request being too far:
LOG:  request to flush past end of generated WAL; request 0/01604018,
current position 0/01604000
ERROR:  xlog flush request 0/01604018 is not satisfied --- flushed
only to 0/01604000

To avoid this issue, it sounds like we need something to use
XLogBytePosToEndRecPtr instead of XLogBytePosToRecPtr to convert the
byte position? With XLogBytePosToRecPtr(), the flush request would
stop at 01604000 instead of going to the next page with 01604018.

In the attached patch, I've added a GetXLogInsertEndRecPtr() function
which is similar to GetXLogInsertRecPtr(), except it uses
XLogBytePosToEndRecPtr() to stop at the page boundary.
There was also another XLogFlush(GetXLogWriteRecPtr()) call in
syncutils.c, so I replaced both calls with
XLogFlush(GetXLogInsertEndRecPtr()).

Regards,
Anthonin Bonnefoy


Attachments:

  [application/octet-stream] v1-0001-Fix-flushing-record-ending-at-page-boundary.patch (3.4K, 2-v1-0001-Fix-flushing-record-ending-at-page-boundary.patch)
  download | inline diff:
From 5cfdb02ef6d0697e5670356345fc89af78d1eeb9 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Thu, 12 Mar 2026 14:29:21 +0100
Subject: Fix flushing record ending at page boundary

In 6eedb2a5fd, a call to XLogFlush(GetXLogInsertRecPtr()) has been added
to allow walsender to flush the latest WAL record. However, if the last
record is at the end of a page, GetXLogInsertRecPtr() will return the
start position for the next record, which will be located in the next
page, after the page header.

XLogInsert will complain with a 'xlog flush request 0/03604018 is not
satisfied --- flushed only to 0/03604000' error, as the flush request
tries to write WAL that hasn't been reserved yet.

This patch fixes the issue by introducing and using a
GetXLogInsertEndRecPtr() which stops at the page boundary, instead of
the beginning of the next page.
---
 src/backend/access/transam/xlog.c           | 18 ++++++++++++++++++
 src/backend/replication/logical/syncutils.c |  2 +-
 src/backend/replication/walsender.c         |  2 +-
 src/include/access/xlog.h                   |  1 +
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b9b678f3722..9fd90636ee1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9595,6 +9595,24 @@ GetXLogInsertRecPtr(void)
 	return XLogBytePosToRecPtr(current_bytepos);
 }
 
+/*
+ * Like GetXLogInsertRecPtr, but if the position is at a page boundary, returns
+ * a pointer to the beginning of the page (ie. before page header), not to where
+ * the first xlog record on that page would go to.
+ */
+XLogRecPtr
+GetXLogInsertEndRecPtr(void)
+{
+	XLogCtlInsert *Insert = &XLogCtl->Insert;
+	uint64		current_bytepos;
+
+	SpinLockAcquire(&Insert->insertpos_lck);
+	current_bytepos = Insert->CurrBytePos;
+	SpinLockRelease(&Insert->insertpos_lck);
+
+	return XLogBytePosToEndRecPtr(current_bytepos);
+}
+
 /*
  * Get latest WAL write pointer
  */
diff --git a/src/backend/replication/logical/syncutils.c b/src/backend/replication/logical/syncutils.c
index ef61ca0437d..8c5da44d42e 100644
--- a/src/backend/replication/logical/syncutils.c
+++ b/src/backend/replication/logical/syncutils.c
@@ -62,7 +62,7 @@ FinishSyncWorker(void)
 	}
 
 	/* And flush all writes. */
-	XLogFlush(GetXLogWriteRecPtr());
+	XLogFlush(GetXLogInsertEndRecPtr());
 
 	if (am_sequencesync_worker())
 	{
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 79fc192b171..dd46de7bcd6 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1887,7 +1887,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * written, because walwriter has shut down already.
 		 */
 		if (got_STOPPING && !RecoveryInProgress())
-			XLogFlush(GetXLogInsertRecPtr());
+			XLogFlush(GetXLogInsertEndRecPtr());
 
 		/*
 		 * To avoid the scenario where standbys need to catch up to a newer
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index fdfb572467b..958f39edda4 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -238,6 +238,7 @@ extern bool RecoveryInProgress(void);
 extern RecoveryState GetRecoveryState(void);
 extern bool XLogInsertAllowed(void);
 extern XLogRecPtr GetXLogInsertRecPtr(void);
+extern XLogRecPtr GetXLogInsertEndRecPtr(void);
 extern XLogRecPtr GetXLogWriteRecPtr(void);
 
 extern uint64 GetSystemIdentifier(void);
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-12 17:24  Fujii Masao <[email protected]>
  parent: Anthonin Bonnefoy <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-03-12 17:24 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: Andres Freund <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

On Thu, Mar 12, 2026 at 11:08 PM Anthonin Bonnefoy
<[email protected]> wrote:
>
> On Tue, Mar 10, 2026 at 6:11 PM Andres Freund <[email protected]> wrote:
> > I'm pretty sure this is not correct as-is, it suffers from the same issue as
> > https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx%40e4rvwavtjswo
> > I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
> > flush to, due to page boundaries.

Thanks for the report!


> I've managed to reproduce this issue by ensuring the FPI_FOR_HINT
> record finishes at the end of a page with the following script (might
> need some adjustment if the record sizes are different):
>
> DROP TABLE IF EXISTS test_insert_rec_ptr;
> CREATE TABLE test_insert_rec_ptr(aid int, data text) WITH
> (autovacuum_enabled = false);
> INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 100) FROM
> generate_series(0, 57);
> -- This should tag the page as full
> BEGIN; UPDATE test_insert_rec_ptr SET aid=2 where aid=1; ROLLBACK;
> CHECKPOINT;
> -- Start with a fresh file
> SELECT pg_switch_wal();
> -- Our FPI_FOR_HINT writes 8193 bytes
> -- With the long header,  the first  page has 8152 bytes available
> -- With the short header, the second page has 8168 bytes available
> -- We want our FPI_FOR_HINT to finish at the end of the second page
> (+/- 8 bytes of alignment)
> -- We need to write the first 25 bytes (or 32 with alignment) in the first page
> -- For that, we need to write 8120 bytes of WAL records
> BEGIN;
> -- 264 bytes of FPW
> INSERT INTO test_insert_rec_ptr VALUES(1);
> -- 74 * 104 bytes
> INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 44) FROM
> generate_series(1, 74);
> -- 108 bytes
> INSERT INTO test_insert_rec_ptr VALUES(1, repeat('a', 48));
> -- 46 bytes
> COMMIT;
> -- 264 + 74 * 104 + 46 + 108 = 8114 bytes, which will round up to 8120
> with alignment
> -- FPI_FOR_HINT record should be at 0x1FE0
> BEGIN; SELECT * FROM test_insert_rec_ptr WHERE aid=2; ROLLBACK;
>
> As far as I can tell, the only impact it has is to complain about the
> write request being too far:
> LOG:  request to flush past end of generated WAL; request 0/01604018,
> current position 0/01604000
> ERROR:  xlog flush request 0/01604018 is not satisfied --- flushed
> only to 0/01604000
>
> To avoid this issue, it sounds like we need something to use
> XLogBytePosToEndRecPtr instead of XLogBytePosToRecPtr to convert the
> byte position? With XLogBytePosToRecPtr(), the flush request would
> stop at 01604000 instead of going to the next page with 01604018.
>
> In the attached patch, I've added a GetXLogInsertEndRecPtr() function
> which is similar to GetXLogInsertRecPtr(), except it uses
> XLogBytePosToEndRecPtr() to stop at the page boundary.
> There was also another XLogFlush(GetXLogWriteRecPtr()) call in
> syncutils.c, so I replaced both calls with
> XLogFlush(GetXLogInsertEndRecPtr()).

Thanks for investigating the issue and making the patch!
It looks good to me.

Andres,

Do you have any comments on the proposed patch?

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-16 05:39  Fujii Masao <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-03-16 05:39 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: Andres Freund <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 2:24 AM Fujii Masao <[email protected]> wrote:
> Thanks for investigating the issue and making the patch!
> It looks good to me.

Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
I updated the patch to use it. Patch attached.
Barring any objections, I will commit it.

-       XLogFlush(GetXLogWriteRecPtr());
+       XLogFlush(GetXLogInsertEndRecPtr());

I excluded the above change from the patch because it seems like a separate
issue. I also wonder whether this code could cause an error in XLogFlush()
even when GetXLogWriteRecPtr() is used.

Regards,

-- 
Fujii Masao


Attachments:

  [application/octet-stream] v2-0001-Fix-WAL-flush-LSN-used-by-logical-walsender-durin.patch (2.4K, 2-v2-0001-Fix-WAL-flush-LSN-used-by-logical-walsender-durin.patch)
  download | inline diff:
From abafee8d06358f9a362373899c330d010d20804a Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Mon, 16 Mar 2026 13:15:13 +0900
Subject: [PATCH v2] Fix WAL flush LSN used by logical walsender during
 shutdown

Commit 6eedb2a5fd8 made the logical walsender call
XLogFlush(GetXLogInsertRecPtr()) to ensure that all pending WAL is flushed,
fixing a publisher shutdown hang. However, if the last WAL record ends at
a page boundary, GetXLogInsertRecPtr() can return an LSN pointing past
the page header, which can cause XLogFlush() to report an error.

A similar issue previously existed in the GiST code. Commit b1f14c96720
introduced GetXLogInsertEndRecPtr(), which returns a safe WAL insertion end
location (returning the start of the page when the last record ends at a page
boundary), and updated the GiST code to use it with XLogFlush().

This commit fixes the issue by making the logical walsender use
XLogFlush(GetXLogInsertEndRecPtr()) when flushing pending WAL during shutdown.

Backpatch to all supported versions.

Reported-by: Andres Freund <[email protected]>
Author: Anthonin Bonnefoy <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux
---
 src/backend/replication/walsender.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 376ff46340d..08253103cb3 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1890,9 +1890,15 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * If we're shutting down, trigger pending WAL to be written out,
 		 * otherwise we'd possibly end up waiting for WAL that never gets
 		 * written, because walwriter has shut down already.
+		 *
+		 * Note that GetXLogInsertEndRecPtr() is used to obtain the WAL flush
+		 * request location instead of GetXLogInsertRecPtr(). Because if the
+		 * last WAL record ends at a page boundary, GetXLogInsertRecPtr() can
+		 * return an LSN pointing past the page header, which may cause
+		 * XLogFlush() to report an error.
 		 */
 		if (got_STOPPING && !RecoveryInProgress())
-			XLogFlush(GetXLogInsertRecPtr());
+			XLogFlush(GetXLogInsertEndRecPtr());
 
 		/*
 		 * To avoid the scenario where standbys need to catch up to a newer
-- 
2.51.2



^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-16 09:45  Anthonin Bonnefoy <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Anthonin Bonnefoy @ 2026-03-16 09:45 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Andres Freund <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

On Mon, Mar 16, 2026 at 6:39 AM Fujii Masao <[email protected]> wrote:
> Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
> I updated the patch to use it. Patch attached.
> Barring any objections, I will commit it.
>
> -       XLogFlush(GetXLogWriteRecPtr());
> +       XLogFlush(GetXLogInsertEndRecPtr());

Thanks for the updated patch! I've run my test script against the
patch and there's no more "xlog flush request xxx is not satisfied"
errors reported.

> I excluded the above change from the patch because it seems like a separate
> issue. I also wonder whether this code could cause an error in XLogFlush()
> even when GetXLogWriteRecPtr() is used.

Ha right, I've mixed Insert and Write and thought that
FinishSyncWorker was also doing a XlogFlush(GetXLogInsertRecPtr())
when writing the patch. If I try to trigger the partial record issue,
GetXLogWriteRecPtr() points at the end of the WAL page containing the
beginning of the FPI_FOR_HINT, there's no attempt to flush in the
future. So FinishSyncWorker doesn't seem impacted by the issue.

Regards,
Anthonin Bonnefoy





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-16 23:16  Fujii Masao <[email protected]>
  parent: Anthonin Bonnefoy <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Fujii Masao @ 2026-03-16 23:16 UTC (permalink / raw)
  To: Anthonin Bonnefoy <[email protected]>; +Cc: Andres Freund <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

On Mon, Mar 16, 2026 at 6:45 PM Anthonin Bonnefoy
<[email protected]> wrote:
>
> On Mon, Mar 16, 2026 at 6:39 AM Fujii Masao <[email protected]> wrote:
> > Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
> > I updated the patch to use it. Patch attached.
> > Barring any objections, I will commit it.
> >
> > -       XLogFlush(GetXLogWriteRecPtr());
> > +       XLogFlush(GetXLogInsertEndRecPtr());
>
> Thanks for the updated patch! I've run my test script against the
> patch and there's no more "xlog flush request xxx is not satisfied"
> errors reported.

Thanks for the test! I've pushed the patch.


> > I excluded the above change from the patch because it seems like a separate
> > issue. I also wonder whether this code could cause an error in XLogFlush()
> > even when GetXLogWriteRecPtr() is used.
>
> Ha right, I've mixed Insert and Write and thought that
> FinishSyncWorker was also doing a XlogFlush(GetXLogInsertRecPtr())
> when writing the patch. If I try to trigger the partial record issue,
> GetXLogWriteRecPtr() points at the end of the WAL page containing the
> beginning of the FPI_FOR_HINT, there's no attempt to flush in the
> future. So FinishSyncWorker doesn't seem impacted by the issue.

Understood. Thanks for the investigation!

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-16 23:26  Michael Paquier <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Michael Paquier @ 2026-03-16 23:26 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Anthonin Bonnefoy <[email protected]>; Andres Freund <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

On Tue, Mar 17, 2026 at 08:16:08AM +0900, Fujii Masao wrote:
> On Mon, Mar 16, 2026 at 6:45 PM Anthonin Bonnefoy
> <[email protected]> wrote:
>> Thanks for the updated patch! I've run my test script against the
>> patch and there's no more "xlog flush request xxx is not satisfied"
>> errors reported.
> 
> Thanks for the test! I've pushed the patch.

This stuff seems sensible enough that I think we should at least have
a test, no?  It does not have to be absolutely perfect in terms of
reproducibility, just good enough to be able to detect it across the
buildfarm.  We already do various things with page boundaries in WAL
during recovery, and a shutdown could be perhaps timed to increase the
reproducibility rate of the issues discussed?
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
@ 2026-03-17 16:51  Anthonin Bonnefoy <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 0 replies; 17+ messages in thread

From: Anthonin Bonnefoy @ 2026-03-17 16:51 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Fujii Masao <[email protected]>; Andres Freund <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers

On Tue, Mar 17, 2026 at 12:26 AM Michael Paquier <[email protected]> wrote:
> This stuff seems sensible enough that I think we should at least have
> a test, no?  It does not have to be absolutely perfect in terms of
> reproducibility, just good enough to be able to detect it across the
> buildfarm.  We already do various things with page boundaries in WAL
> during recovery, and a shutdown could be perhaps timed to increase the
> reproducibility rate of the issues discussed?

I initially thought that there was no easy way to trigger this issue
reliably in a test: the script I've been using won't work as soon as
there are changes in the record sizes. Then I remembered that
pg_logical_emit_message existed and could be used to write a WAL
record of a specific size, without allocating a xid and without
flushing the record.

With this, the test can be simplified to:
SELECT pg_switch_wal();
BEGIN;
SELECT pg_logical_emit_message(false, '', repeat('a', 16265), false);
ROLLBACK;

Any change in WAL short header, long header or xl_logical_message
struct will "break" the test since the record won't be at the exact
end of the page boundary. This also assumes that we have an 8 byte
alignment. 32 bits machine will have the WAL record ends at 3FF0, so
not exactly the end, but that should be fine to test different
conditions.

A word of caution about this test: While running it on my machine,
I've managed to trigger some weird WAL corruption. The new segment
after the switch had 1 or 2 excessive bytes at the start of the
segment just before the xlog page magic, shifting the whole file. The
first time it happened, I thought I'd messed something up and added
the bytes myself while looking at the WAL with imhex. The second time,
I've only run the script, and the new segment had a 1.1MB size shortly
after, so I'm pretty sure I didn't do anything that could have
introduced those excessive bytes.

I'm still trying to understand the trigger conditions (some race
condition between the switch and the walwriter?), but if this test is
merged, it may trigger this WAL corruption issue on the buildfarm.

Regards,
Anthonin Bonnefoy


Attachments:

  [application/octet-stream] v1-0001-Add-test-shutting-down-walsender-with-unflushed-r.patch (2.9K, 2-v1-0001-Add-test-shutting-down-walsender-with-unflushed-r.patch)
  download | inline diff:
From 97b9ba4374b5423f40b3868d83b515d94969dfa6 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 17 Mar 2026 15:07:55 +0100
Subject: Add test shutting down walsender with unflushed record

6eedb2a5fd8 fixed an issue where the walsender was stuck in a busy loop,
trying to read an unflused record.

d927b4bd97 fixed another issue introduced by the previous commit, where
XLogFlush would be called past the end of the generated WAL, generating
an error log.

This commit adds a test to cover those two fixes. By writing a logical
message of a specific size, we can reach a state where the last record
is crossing the page boundary, and ends at the end of the next page,
recreating the conditions for the aforementioned issues.
---
 src/test/recovery/t/006_logical_decoding.pl | 50 ++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index 97d11f98b59..1efb6f7b527 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -275,7 +275,55 @@ is( $node_primary->safe_psql(
 	qq(Check that reset timestamp is later after resetting stats for slot '$stats_test_slot1' again.)
 );
 
-# done with the node
+SKIP:
+{
+
+	# some Windows Perls at least don't like IPC::Run's start/kill_kill regime.
+	skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32';
+
+	# Test stopping the primary with an active walsender and an unflushed record
+	# which crosses the page boundary and ends at the end of the next page.
+	#
+	# First, start pg_recvlogical to have an active walsender
+	my $pg_recvlogical = IPC::Run::start(
+		[
+			'pg_recvlogical',
+			'--dbname' => $node_primary->connstr('postgres'),
+			'--slot' => 'test_slot',
+			'--file' => '-',
+			'--start'
+		]);
+
+	# Then, we write a logical message WAL record which finishes at the end of a
+	# WAL page, using a rollback so the WAL record isn't flushed.
+	#
+	# The size of a WAL logical message record is 55 bytes + message length
+	# Starting from a fresh WAL segment, we have:
+	#   - 8152 bytes available in the first page (long header)
+	#   - 8168 bytes available in the second page (short header)
+	# We need to write 16320 bytes of logical message WAL record, which can be done
+	# using a 16265 bytes long message.
+	$node_primary->safe_psql('postgres',
+		qq[
+		SELECT pg_switch_wal();
+		BEGIN;
+		SELECT pg_logical_emit_message(false, '', repeat('a', 16265), false);
+		ROLLBACK;
+		]
+	);
+
+	# try to restart
+	$node_primary->restart;
+	$pg_recvlogical->kill_kill;
+
+	my $logfile = slurp_file($node_primary->logfile());
+	unlike(
+		$logfile,
+		qr/request to flush past end of generated WAL/,
+		"There's no flush request past end of generated WAL");
+}
+
+# stop the node
 $node_primary->stop;
 
 done_testing();
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 17+ messages in thread


end of thread, other threads:[~2026-03-17 16:51 UTC | newest]

Thread overview: 17+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-26 16:19 Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record Fujii Masao <[email protected]>
2026-03-02 10:38 ` Anthonin Bonnefoy <[email protected]>
2026-03-03 17:11   ` Anthonin Bonnefoy <[email protected]>
2026-03-03 17:29     ` Fujii Masao <[email protected]>
2026-03-04 08:51       ` Anthonin Bonnefoy <[email protected]>
2026-03-04 17:46         ` Fujii Masao <[email protected]>
2026-03-05 08:40           ` Anthonin Bonnefoy <[email protected]>
2026-03-05 23:46             ` Fujii Masao <[email protected]>
2026-03-06 07:48               ` Fujii Masao <[email protected]>
2026-03-10 17:11                 ` Andres Freund <[email protected]>
2026-03-12 14:07                   ` Anthonin Bonnefoy <[email protected]>
2026-03-12 17:24                     ` Fujii Masao <[email protected]>
2026-03-16 05:39                       ` Fujii Masao <[email protected]>
2026-03-16 09:45                         ` Anthonin Bonnefoy <[email protected]>
2026-03-16 23:16                           ` Fujii Masao <[email protected]>
2026-03-16 23:26                             ` Michael Paquier <[email protected]>
2026-03-17 16:51                               ` Anthonin Bonnefoy <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox