public inbox for [email protected]
help / color / mirror / Atom feedFrom: Fujii Masao <[email protected]>
To: Anthonin Bonnefoy <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
Date: Fri, 6 Mar 2026 08:46:15 +0900
Message-ID: <CAHGQGwGvnpN=2bo+F7H90YLFcx9=SazwLkcx+0gEcrbQy5NVZg@mail.gmail.com> (raw)
In-Reply-To: <CAO6_Xqq73TPa3M6nQ7RqRhKkcphy1JX7aNGTYy-x_Sn+6a8Z_Q@mail.gmail.com>
References: <CAO6_Xqo3co3BuUVEVzkaBVw9LidBgeeQ_2hfxeLMQcXwovB3GQ@mail.gmail.com>
<CAO6_XqrZEREa5d+dyjahX6bteBhoN=8Jid-3a4f6Q35sWrv9eg@mail.gmail.com>
<CAHGQGwHFKF+x4E+SqedMCnmLCitxjTUUtSyL_+mMeuq-GbEt6w@mail.gmail.com>
<CAO6_Xqp+ADb6KZVWLMALu3xmwVUEO8S1EiCnp38mG6BrHrEnuA@mail.gmail.com>
<CAO6_XqqKDV+AuP=Gf4kRKPqzyYTsOyGd3LE8Jqkwi7EMPJpbhA@mail.gmail.com>
<CAHGQGwHc5yH4Nxp59KXJP0kAr61j3W7QeSKT2HxVjZa3OrLzmg@mail.gmail.com>
<CAO6_Xqq1h6kggb1o206rgouPS0H5jnjahzZ0We-9ggnBjB2JsA@mail.gmail.com>
<CAHGQGwFJnNUOMiW9wR-2WjSKzzj0wV8p55J8bnJ6mik=z0oFPQ@mail.gmail.com>
<[email protected]>
<CAO6_Xqq73TPa3M6nQ7RqRhKkcphy1JX7aNGTYy-x_Sn+6a8Z_Q@mail.gmail.com>
On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
<[email protected]> wrote:
> So it was relying on GetInsertRecPtr() instead of
> GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
> only returns the position of the last full xlog page, meaning it
> doesn't fix the issue we have where the last partial page contains a
> continuation record.
>
> Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
> get the shutdown stuck issue.
>
> Using GetXLogInsertRecPtr() is required to make sure the last partial
> page is correctly flushed.
Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
almost nothing during recovery, I added a !RecoveryInProgress() check
as follows. I've attached the latest version of the patch and updated
the commit message.
- if (got_STOPPING)
- XLogBackgroundFlush();
+ if (got_STOPPING && !RecoveryInProgress())
+ XLogFlush(GetXLogInsertRecPtr());
Regards,
--
Fujii Masao
Attachments:
[application/octet-stream] v6-0001-Fix-publisher-shutdown-hang-caused-by-logical-wal.patch (2.0K, 2-v6-0001-Fix-publisher-shutdown-hang-caused-by-logical-wal.patch)
download | inline diff:
From 1897c4b5979853b2fb4d679787ac9b633f183076 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <[email protected]>
Date: Tue, 3 Mar 2026 17:42:40 +0100
Subject: [PATCH v6] Fix publisher shutdown hang caused by logical walsender
busy loop.
Previously, when logical replication was running, shutting down
the publisher could cause the logical walsender to enter a busy loop
and prevent the publisher from completing shutdown.
During shutdown, the logical walsender waits for all pending WAL
to be written out. However, some WAL records could remain unflushed,
causing the walsender to wait indefinitely.
The issue occurred because the walsender used XLogBackgroundFlush() to
flush pending WAL. This function does not guarantee that all WAL is written.
For example, WAL generated by a transaction without an assigned
transaction ID that aborts might not be flushed.
This commit fixes the bug by making the logical walsender call XLogFlush()
instead, ensuring that all pending WAL is written and preventing
the busy loop during shutdown.
Backpatch to all supported versions.
Author: Anthonin Bonnefoy <[email protected]>
Reviewed-by: Alexander Law <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/CAO6_Xqo3co3BuUVEVzkaBVw9LidBgeeQ_2hfxeLMQcXwovB3GQ@mail.gmail.com
Backpatch-through: 14
---
src/backend/replication/walsender.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2cde8ebc729..917d2a0c3f4 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1885,8 +1885,8 @@ WalSndWaitForWal(XLogRecPtr loc)
* otherwise we'd possibly end up waiting for WAL that never gets
* written, because walwriter has shut down already.
*/
- if (got_STOPPING)
- XLogBackgroundFlush();
+ if (got_STOPPING && !RecoveryInProgress())
+ XLogFlush(GetXLogInsertRecPtr());
/*
* To avoid the scenario where standbys need to catch up to a newer
--
2.51.2
view thread (17+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
In-Reply-To: <CAHGQGwGvnpN=2bo+F7H90YLFcx9=SazwLkcx+0gEcrbQy5NVZg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox