[PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits

public inbox for [email protected]  
help / color / mirror / Atom feed

[PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
11+ messages / 7 participants
[nested] [flat]

* [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-19 05:46  JoongHyuk Shin <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: JoongHyuk Shin @ 2026-04-19 05:46 UTC (permalink / raw)
  To: [email protected] <[email protected]>

In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
The caller (LockBufferForCleanup) loops back, sets up another
deadlock_timeout,
and the signal gets sent again every interval.

The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
by adding a second ProcWaitForSignal() after the deadlock-check signal.
The buffer-pin path was left with an XXX comment asking "should we fix
this?".

The attached patch applies the same fix: after sending the deadlock-check
signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
so the startup process waits for UnpinBuffer() rather than looping
and re-signaling.

Patch attached.

Attachments:

  [application/octet-stream] 0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patch (2.4K, 3-0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patch)
  download | inline diff:
From 58239700edf0c669f4807da6140a595c2a1e8a5e Mon Sep 17 00:00:00 2001
From: JoongHyuk Shin <[email protected]>
Date: Fri, 17 Apr 2026 16:58:31 +0900
Subject: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin
 waits

After sending RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK, the startup process
returned without waiting, so the caller's loop would fire another
deadlock_timeout and re-send the signal every interval.  This added
unnecessary overhead in both the startup process and backends.

Fix by adding a ProcWaitForSignal() call after the deadlock-check
signal, mirroring the approach already used in the lock-conflict path
(commit 8900b5a9d59a).  This ensures the signal is sent at most once
per deadlock_timeout period rather than repeatedly.

Also remove the XXX comment that noted this problem.
---
 src/backend/storage/ipc/standby.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 29af7733948..9744db5715c 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -851,17 +851,19 @@ ResolveRecoveryConflictWithBufferPin(void)
 		/*
 		 * Send out a request for hot-standby backends to check themselves for
 		 * deadlocks.
-		 *
-		 * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
-		 * to be signaled by UnpinBuffer() again and send a request for
-		 * deadlocks check if deadlock_timeout happens. This causes the
-		 * request to continue to be sent every deadlock_timeout until the
-		 * buffer is unpinned or ltime is reached. This would increase the
-		 * workload in the startup process and backends. In practice it may
-		 * not be so harmful because the period that the buffer is kept pinned
-		 * is basically no so long. But we should fix this?
 		 */
 		SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+
+		/*
+		 * Wait here to be signaled by UnpinBuffer(), to prevent the
+		 * subsequent ResolveRecoveryConflictWithBufferPin() call (from the
+		 * caller's loop) from firing another deadlock_timeout and re-sending
+		 * the deadlock-check signal.  Without this, the signal would be sent
+		 * every deadlock_timeout interval until the buffer is unpinned or
+		 * ltime is reached.
+		 */
+		got_standby_deadlock_timeout = false;
+		ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
 	}

 	/*
-- 
2.52.0

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-21 05:42  Fujii Masao <[email protected]>
  parent: JoongHyuk Shin <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Fujii Masao @ 2026-04-21 05:42 UTC (permalink / raw)
  To: JoongHyuk Shin <[email protected]>; +Cc: [email protected] <[email protected]>

On Sun, Apr 19, 2026 at 2:47 PM JoongHyuk Shin <[email protected]> wrote:
>
> In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
> the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
> The caller (LockBufferForCleanup) loops back, sets up another deadlock_timeout,
> and the signal gets sent again every interval.
>
> The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
> by adding a second ProcWaitForSignal() after the deadlock-check signal.
> The buffer-pin path was left with an XXX comment asking "should we fix this?".
>
> The attached patch applies the same fix: after sending the deadlock-check
> signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
> so the startup process waits for UnpinBuffer() rather than looping
> and re-signaling.
>
> Patch attached.

Thanks for the patch! LGTM.

Since this change improves recovery-conflict behavior rather than fixing a bug,
it doesn't seem to need backpatching and we may need to wait until v20
development opens (probably July) before committing it.

While reading the patch and ResolveRecoveryConflictWithBufferPin(), I also
noticed that got_standby_delay_timeout is not initialized to false before
enabling the timeout. This is unrelated to the patch, and I think it is
harmless in the current code, but would it be better to initialize it there,
as we already do for got_standby_deadlock_timeout?

                    if (ltime != 0)
                    {
    +                       got_standby_delay_timeout = false;
                            timeouts[cnt].id = STANDBY_TIMEOUT;
                            timeouts[cnt].type = TMPARAM_AT;
                            timeouts[cnt].fin_time = ltime;

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-21 05:55  Michael Paquier <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: Michael Paquier @ 2026-04-21 05:55 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: JoongHyuk Shin <[email protected]>; [email protected] <[email protected]>

On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> Since this change improves recovery-conflict behavior rather than fixing a bug,
> it doesn't seem to need backpatching and we may need to wait until v20
> development opens (probably July) before committing it.

Yeah, this one is an improvement, not an actual bug, so let's wait for
v20 if worth doing (I did not check it). 
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-27 11:07  JoongHyuk Shin <[email protected]>
  parent: Michael Paquier <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: JoongHyuk Shin @ 2026-04-27 11:07 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Fujii Masao <[email protected]>; [email protected] <[email protected]>

Thanks for the review.
v2 attached, with the suggested initialization added for symmetry.

Agreed this is an improvement rather than a bug fix,
so I've updated the CF tag to Performance accordingly.

I also verified the fix locally on a primary-standby setup,
using the buffer-pin conflict scenario from src/test/recovery/t/
031_recovery_conflict.pl
(aborted INSERT + cursor on standby + VACUUM FREEZE on primary).
On master, strace showed 9 SIGUSR1 broadcasts to the conflicting backend
over a 10-second window (one per deadlock_timeout).
With the patch applied, only 1 broadcast over the same window.

Patch attached.

--
JoongHyuk Shin

On Tue, Apr 21, 2026 at 2:55 PM Michael Paquier <[email protected]> wrote:

> On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > Since this change improves recovery-conflict behavior rather than fixing
> a bug,
> > it doesn't seem to need backpatching and we may need to wait until v20
> > development opens (probably July) before committing it.
>
> Yeah, this one is an improvement, not an actual bug, so let's wait for
> v20 if worth doing (I did not check it).
> --
> Michael
>


Attachments:

  [application/octet-stream] v2-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch (2.8K, 3-v2-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch)
  download | inline diff:
From e70a2fc74a63d4c1e3d1277e27a37b1e26710fff Mon Sep 17 00:00:00 2001
From: JoongHyuk Shin <[email protected]>
Date: Fri, 17 Apr 2026 16:58:31 +0900
Subject: [PATCH v2] Prevent repeated deadlock-check signals in standby buffer
 pin waits

After sending RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK, the startup process
returned without waiting, so the caller's loop would fire another
deadlock_timeout and re-send the signal every interval.  This added
unnecessary overhead in both the startup process and backends.

Fix by adding a ProcWaitForSignal() call after the deadlock-check
signal, mirroring the approach already used in the lock-conflict path
(commit 8900b5a9d59a).  This ensures the signal is sent at most once
per deadlock_timeout period rather than repeatedly.

Additionally, reset got_standby_delay_timeout to false before enabling
STANDBY_TIMEOUT, for symmetry with the existing got_standby_deadlock_timeout
reset.

Remove the XXX comment that noted this problem.
---
 src/backend/storage/ipc/standby.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 29af7733948..0dba1fb4289 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -818,6 +818,7 @@ ResolveRecoveryConflictWithBufferPin(void)
 
 		if (ltime != 0)
 		{
+			got_standby_delay_timeout = false;
 			timeouts[cnt].id = STANDBY_TIMEOUT;
 			timeouts[cnt].type = TMPARAM_AT;
 			timeouts[cnt].fin_time = ltime;
@@ -851,17 +852,19 @@ ResolveRecoveryConflictWithBufferPin(void)
 		/*
 		 * Send out a request for hot-standby backends to check themselves for
 		 * deadlocks.
-		 *
-		 * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
-		 * to be signaled by UnpinBuffer() again and send a request for
-		 * deadlocks check if deadlock_timeout happens. This causes the
-		 * request to continue to be sent every deadlock_timeout until the
-		 * buffer is unpinned or ltime is reached. This would increase the
-		 * workload in the startup process and backends. In practice it may
-		 * not be so harmful because the period that the buffer is kept pinned
-		 * is basically no so long. But we should fix this?
 		 */
 		SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+
+		/*
+		 * Wait here to be signaled by UnpinBuffer(), to prevent the
+		 * subsequent ResolveRecoveryConflictWithBufferPin() call (from the
+		 * caller's loop) from firing another deadlock_timeout and re-sending
+		 * the deadlock-check signal.  Without this, the signal would be sent
+		 * every deadlock_timeout interval until the buffer is unpinned or
+		 * ltime is reached.
+		 */
+		got_standby_deadlock_timeout = false;
+		ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
 	}
 
 	/*
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-05-18 20:32  Álvaro Herrera <[email protected]>
  parent: Michael Paquier <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: Álvaro Herrera @ 2026-05-18 20:32 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Fujii Masao <[email protected]>; JoongHyuk Shin <[email protected]>; [email protected] <[email protected]>; Vitaly Davydov <[email protected]>

Hello,

On 2026-Apr-21, Michael Paquier wrote:

> On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > Since this change improves recovery-conflict behavior rather than fixing a bug,
> > it doesn't seem to need backpatching and we may need to wait until v20
> > development opens (probably July) before committing it.
> 
> Yeah, this one is an improvement, not an actual bug, so let's wait for
> v20 if worth doing (I did not check it). 

Hmm, is this related to 
https://postgr.es/m/[email protected] ?
In there, Vitaly claims to be reporting a bug that goes back to pg15,
which contradicts this assessment.

Regards,

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-05-22 08:41  JoongHyuk Shin <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: JoongHyuk Shin @ 2026-05-22 08:41 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: Michael Paquier <[email protected]>; Fujii Masao <[email protected]>; [email protected] <[email protected]>; Vitaly Davydov <[email protected]>

Hello.

Same function, different races, I think.
Vitaly reports a missed wake-up where deadlock_timeout never fires
(spurious SIGALRM from log_startup_progress_interval plus the lazy
setitimer in 09cf1d52).
This patch addresses the opposite,
deadlock_timeout does fire, but LockBufferForCleanup loops back and re-arms
it,
so the signal repeats once per second.

The two fixes do not overlap
(the added ProcWaitForSignal sits inside the deadlock branch that Vitaly's
scenario never reaches).

-- 
JH Shin

On Wed, May 20, 2026 at 7:15 AM Álvaro Herrera <[email protected]> wrote:

> Hello,
>
> On 2026-Apr-21, Michael Paquier wrote:
>
> > On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > > Since this change improves recovery-conflict behavior rather than
> fixing a bug,
> > > it doesn't seem to need backpatching and we may need to wait until v20
> > > development opens (probably July) before committing it.
> >
> > Yeah, this one is an improvement, not an actual bug, so let's wait for
> > v20 if worth doing (I did not check it).
>
> Hmm, is this related to
> https://postgr.es/m/[email protected] ?
> In there, Vitaly claims to be reporting a bug that goes back to pg15,
> which contradicts this assessment.
>
> Regards,
>
> --
> Álvaro Herrera        Breisgau, Deutschland  —
> https://www.EnterpriseDB.com/
> "¿Qué importan los años?  Lo que realmente importa es comprobar que
> a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-05-29 06:30  Michael Paquier <[email protected]>
  parent: JoongHyuk Shin <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Michael Paquier @ 2026-05-29 06:30 UTC (permalink / raw)
  To: JoongHyuk Shin <[email protected]>; +Cc: Álvaro Herrera <[email protected]>; Fujii Masao <[email protected]>; [email protected] <[email protected]>; Vitaly Davydov <[email protected]>

On Fri, May 22, 2026 at 05:41:03PM +0900, JoongHyuk Shin wrote:
> This patch addresses the opposite,
> deadlock_timeout does fire, but LockBufferForCleanup loops back and re-arms
> it, so the signal repeats once per second.

Right.  I don't really see why this should be backpatched.  One
argument would be more consistency of this area of the code across all
the stable branches, but the argument is kind of moot as this does not
fix a problem, just improves a bit what we have.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-05-29 21:08  Zsolt Parragi <[email protected]>
  parent: JoongHyuk Shin <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Zsolt Parragi @ 2026-05-29 21:08 UTC (permalink / raw)
  To: [email protected]

Hello!

Shouldn't this need a condition similar to
ResolveRecoveryConflictWithLock (if (logging_conflict) ...)?

Otherwise this can result in a long wait time, with:

log_recovery_conflict_waits = on
deadlock_timeout = 100ms
max_standby_streaming_delay = 5s

It can wait for 5 seconds instead of the current 0.1 seconds without the change.

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-05-30 11:51  Ilmar Y <[email protected]>
  parent: Zsolt Parragi <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Ilmar Y @ 2026-05-30 11:51 UTC (permalink / raw)
  To: [email protected]; +Cc: JoongHyuk Shin <[email protected]>

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       tested, failed
Spec compliant:           not tested
Documentation:            not tested

Hi,

I looked at v2, focusing on the latest question about
log_recovery_conflict_waits behavior.

The patch applies cleanly on current master at
db5ed03217b9c238703df8b4b286115d6e940488.  git diff --check reports no
issues.  I built both master and v2 locally with:

./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
make -s -j8
make -s install

Both builds passed.

I tried to check the case Zsolt described.  In my local repro, the final
cancellation still happens at about max_standby_streaming_delay in both
master and v2.  The visible difference is that v2 delays the first
log_recovery_conflict_waits "recovery still waiting" message.

I used a small primary/standby setup based on the buffer-pin conflict pattern
from src/test/recovery/t/031_recovery_conflict.pl:

log_recovery_conflict_waits = on
deadlock_timeout = 100ms
max_standby_streaming_delay = 5s

The scenario is:

1. Create a small table.
2. Generate an aborted tuple on the primary and replay it on the standby.
3. Open a cursor on the standby and fetch one row, leaving the transaction
   idle with the cursor open.
4. Run VACUUM FREEZE on the primary.

On current master, the standby log shows:

recovery still waiting after 101.060 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.324 ms: recovery conflict on buffer pin

With v2, the standby log shows:

recovery still waiting after 5001.064 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.535 ms: recovery conflict on buffer pin

So in this repro the early "still waiting" log at deadlock_timeout is lost,
and the first waiting log is emitted only when the wait is already being
resolved at about max_standby_streaming_delay.

This seems to come from the new second ProcWaitForSignal() inside
ResolveRecoveryConflictWithBufferPin().  After the deadlock timeout fires, v2
sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and waits again internally.  For a
non-deadlock buffer pin holder, LockBufferForCleanup() does not get control
back after deadlock_timeout, so it cannot emit the early recovery conflict
log message.

This looks similar to the reason ResolveRecoveryConflictWithLock() has the
logging_conflict argument: if the conflict still needs to be logged, it returns
after sending the deadlock-check signal, and only avoids the repeated wait on
the later call after the conflict has been logged.

Should ResolveRecoveryConflictWithBufferPin() preserve the same behavior, for
example by skipping the new second wait until LockBufferForCleanup() has had a
chance to emit the first log_recovery_conflict_waits message?

I have not checked the repeated SIGUSR1 count with strace in my local macOS
environment, and I have not reviewed the backpatching question.

Regards,
Ilmar Yunusov

The new status of this patch is: Waiting on Author

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-06-03 08:27  JoongHyuk Shin <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: JoongHyuk Shin @ 2026-06-03 08:27 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Álvaro Herrera <[email protected]>; Fujii Masao <[email protected]>; [email protected] <[email protected]>; Vitaly Davydov <[email protected]>; [email protected]; [email protected]

Thanks for the reviews.

v3 attached.

* Emit "recovery still waiting" inside the function.
It now fires at deadlock_timeout instead of max_standby_streaming_delay
(Ilmar).

* Pass waitStart and &logged_recovery_conflict from the caller;
  the in-function branch reuses the same gate.

* An early-return alternative reopens a race in the
  SetStartupBufferPinWaitBufId(-1) gap; the lock path has
  no equivalent because its caller is structured differently.

* Covered by src/test/recovery/t/054_bufferpin_conflict_log_timing.pl
  (FAIL on v2, PASS on v3).

--
JH Shin

On Fri, May 29, 2026 at 3:31 PM Michael Paquier <[email protected]> wrote:

> On Fri, May 22, 2026 at 05:41:03PM +0900, JoongHyuk Shin wrote:
> > This patch addresses the opposite,
> > deadlock_timeout does fire, but LockBufferForCleanup loops back and
> re-arms
> > it, so the signal repeats once per second.
>
> Right.  I don't really see why this should be backpatched.  One
> argument would be more consistency of this area of the code across all
> the stable branches, but the argument is kind of moot as this does not
> fix a problem, just improves a bit what we have.
> --
> Michael
>


Attachments:

  [application/octet-stream] v3-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch (13.5K, 3-v3-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch)
  download | inline diff:
From 20995d3594f9870cea7182cf86ced183fe2cd199 Mon Sep 17 00:00:00 2001
From: JoongHyuk Shin <[email protected]>
Date: Sun, 31 May 2026 22:36:34 +0900
Subject: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin
 waits

When a startup process in hot standby waits on a buffer pin held by
backends and deadlock_timeout fires, ResolveRecoveryConflictWithBufferPin()
sent a deadlock-check signal to all backends and then returned to its
caller (LockBufferForCleanup).  The caller looped back and re-armed the
deadlock_timeout, so the signal was re-broadcast every deadlock_timeout
interval until the buffer was unpinned or max_standby_streaming_delay was
reached.  Commit 8900b5a9d59a (2021) fixed the same pattern in the
parallel lock path but left an XXX comment in the buffer pin path noting
the same issue had not been addressed.

After sending the deadlock-check signal, wait inside this function for
UnpinBuffer() to wake us up.  This makes the signal fire once per
buffer-pin wait instead of once per deadlock_timeout interval.

Because control no longer returns to the caller between the signal and
the unpin, the caller's loop-top "recovery still waiting" log emission
(bufmgr.c) would otherwise be skipped.  Emit that log message inside the
function instead, using the caller's waitStart and logged_recovery_conflict
state passed by reference.  This keeps the user-visible log timing at
deadlock_timeout (typically 100ms) rather than letting it slip to
max_standby_streaming_delay.

The lock path's logging_conflict early-return pattern was considered but
not adopted here.  The deadlock-check signal handler matches against the
bufid published by SetStartupBufferPinWaitBufId(), and LockBufferForCleanup
unpublishes between caller iterations.  An early-return fix would reopen
a race in that unpublish/re-publish gap where a backend with a deferred
signal handler sees bufid < 0 and no-ops.  The lock path has no such
publish/unpublish step because conflicting backends are identified
directly from the lock table, so the early-return pattern is safe there.

The log-timing fix is covered by
src/test/recovery/t/054_bufferpin_conflict_log_timing.pl, which asserts
the "still waiting" log appears well before max_standby_streaming_delay.
The repeated-signal behavior is verified manually via strace on the
startup process.
---
 src/backend/storage/buffer/bufmgr.c           |   3 +-
 src/backend/storage/ipc/standby.c             |  73 ++++++++--
 src/include/storage/standby.h                 |   3 +-
 src/test/recovery/meson.build                 |   1 +
 .../t/054_bufferpin_conflict_log_timing.pl    | 130 ++++++++++++++++++
 5 files changed, 198 insertions(+), 12 deletions(-)
 create mode 100644 src/test/recovery/t/054_bufferpin_conflict_log_timing.pl

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index cc398db124d..5b184f99103 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -6789,7 +6789,8 @@ LockBufferForCleanup(Buffer buffer)
 			/* Publish the bufid that Startup process waits on */
 			SetStartupBufferPinWaitBufId(buffer - 1);
 			/* Set alarm and then wait to be signaled by UnpinBuffer() */
-			ResolveRecoveryConflictWithBufferPin();
+			ResolveRecoveryConflictWithBufferPin(waitStart,
+												 &logged_recovery_conflict);
 			/* Reset the published bufid */
 			SetStartupBufferPinWaitBufId(-1);
 		}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index de9092fdf5b..d3dc26efd7f 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -790,13 +790,26 @@ cleanup:
  * Deadlocks are extremely rare, and relatively expensive to check for,
  * so we don't do a deadlock check right away ... only if we have had to wait
  * at least deadlock_timeout.
+ *
+ * Parameters waitStart and logged_recovery_conflict are forwarded from the
+ * caller (LockBufferForCleanup) so that, when this function emits the
+ * "recovery still waiting" log after the deadlock-check signal, it uses the
+ * caller's wait-start timestamp and gate.  When log_recovery_conflict_waits
+ * is disabled, waitStart is 0 and the in-function emission is skipped.
+ * logged_recovery_conflict must be a valid (non-NULL) pointer; the function
+ * sets *logged_recovery_conflict to true after emission so that neither the
+ * caller's loop-top branch nor a later entry re-emits the same conflict.
+ * The caller's "waiting" ps-display suffix remains active across the second
+ * ProcWaitForSignal until UnpinBuffer() wakes us up.
  */
 void
-ResolveRecoveryConflictWithBufferPin(void)
+ResolveRecoveryConflictWithBufferPin(TimestampTz waitStart,
+									 bool *logged_recovery_conflict)
 {
 	TimestampTz ltime;
 
 	Assert(InHotStandby);
+	Assert(logged_recovery_conflict != NULL);
 
 	ltime = GetStandbyLimitTime();
 
@@ -818,6 +831,7 @@ ResolveRecoveryConflictWithBufferPin(void)
 
 		if (ltime != 0)
 		{
+			got_standby_delay_timeout = false;
 			timeouts[cnt].id = STANDBY_TIMEOUT;
 			timeouts[cnt].type = TMPARAM_AT;
 			timeouts[cnt].fin_time = ltime;
@@ -851,17 +865,56 @@ ResolveRecoveryConflictWithBufferPin(void)
 		/*
 		 * Send out a request for hot-standby backends to check themselves for
 		 * deadlocks.
-		 *
-		 * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
-		 * to be signaled by UnpinBuffer() again and send a request for
-		 * deadlocks check if deadlock_timeout happens. This causes the
-		 * request to continue to be sent every deadlock_timeout until the
-		 * buffer is unpinned or ltime is reached. This would increase the
-		 * workload in the startup process and backends. In practice it may
-		 * not be so harmful because the period that the buffer is kept pinned
-		 * is basically no so long. But we should fix this?
 		 */
 		SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+
+		/*
+		 * Emit the "still waiting" log here, because the second
+		 * ProcWaitForSignal() below keeps control inside this function until
+		 * the buffer is unpinned, so the caller's loop-top log emission would
+		 * otherwise be skipped.  The caller passes its own waitStart and
+		 * logged_recovery_conflict so the emission semantics match what the
+		 * caller would have produced on its next iteration.
+		 *
+		 * The caller's loop-top branch additionally guards on
+		 * TimestampDifferenceExceeds(waitStart, now, DeadlockTimeout) before
+		 * logging.  That extra guard is redundant here: we just woke from
+		 * STANDBY_DEADLOCK_TIMEOUT, so DeadlockTimeout has elapsed since
+		 * waitStart by construction.  The caller-side waitStart != 0 check
+		 * is a shorthand for log_recovery_conflict_waits=on (the caller
+		 * sets waitStart only when that GUC is true), so this branch
+		 * stays a no-op when conflict-wait logging is disabled.
+		 */
+		if (waitStart != 0 && !*logged_recovery_conflict)
+		{
+			LogRecoveryConflict(RECOVERY_CONFLICT_BUFFERPIN,
+								waitStart, GetCurrentTimestamp(),
+								NULL, true);
+			*logged_recovery_conflict = true;
+		}
+
+		/*
+		 * Wait here to be signaled by UnpinBuffer(), to prevent the
+		 * subsequent ResolveRecoveryConflictWithBufferPin() call (from the
+		 * caller's loop) from firing another deadlock_timeout and re-sending
+		 * the deadlock-check signal.  Without this, the signal would be sent
+		 * every deadlock_timeout interval until the buffer is unpinned or
+		 * ltime is reached.
+		 *
+		 * The same wakeup assumption as the first wait above applies: only
+		 * UnpinBuffer() and STANDBY_TIMEOUT (if armed) can wake us here
+		 * (STANDBY_DEADLOCK_TIMEOUT was one-shot and has already fired).
+		 *
+		 * If STANDBY_TIMEOUT fires during this wait, control returns to the
+		 * caller (LockBufferForCleanup), which re-enters this function on
+		 * its next iteration; the GetCurrentTimestamp() >= ltime fast-path
+		 * at the top then sends the cancel signal.  We intentionally do not
+		 * check got_standby_delay_timeout here because the cancel is one
+		 * caller-loop iteration away, and adding the check would duplicate
+		 * logic with the fast-path branch.
+		 */
+		got_standby_deadlock_timeout = false;
+		ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
 	}
 
 	/*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 6a314c693cd..6aa0c9ac2bb 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -79,7 +79,8 @@ extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
 extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
 
 extern void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict);
-extern void ResolveRecoveryConflictWithBufferPin(void);
+extern void ResolveRecoveryConflictWithBufferPin(TimestampTz waitStart,
+												 bool *logged_recovery_conflict);
 extern void CheckRecoveryConflictDeadlock(void);
 extern void StandbyDeadLockHandler(void);
 extern void StandbyTimeoutHandler(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 9eb8ed11425..64d05d5041c 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -62,6 +62,7 @@ tests += {
       't/051_effective_wal_level.pl',
       't/052_checkpoint_segment_missing.pl',
       't/053_standby_login_event_trigger.pl',
+      't/054_bufferpin_conflict_log_timing.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/054_bufferpin_conflict_log_timing.pl b/src/test/recovery/t/054_bufferpin_conflict_log_timing.pl
new file mode 100644
index 00000000000..570e5108b36
--- /dev/null
+++ b/src/test/recovery/t/054_bufferpin_conflict_log_timing.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Verify that "recovery still waiting" is logged after deadlock_timeout
+# during a buffer pin recovery conflict, rather than only at
+# max_standby_streaming_delay.
+#
+# This guards against a regression where the startup process waits the
+# full max_standby_streaming_delay before returning to the caller, which
+# delays the conflict log emitted from LockBufferForCleanup().
+#
+# Reported by Ilmar Yunusov on the v2 thread of "Standby deadlock check
+# repeats signal every deadlock_timeout".
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use Time::HiRes qw(usleep);
+
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+
+# Ilmar's reported environment: short deadlock_timeout, long streaming
+# delay, conflict-wait logging enabled.
+$node_primary->append_conf(
+	'postgresql.conf', qq[
+log_recovery_conflict_waits = on
+deadlock_timeout = 100ms
+max_standby_streaming_delay = 5s
+]);
+$node_primary->start;
+
+my $backup_name = 'bp_backup';
+$node_primary->backup($backup_name);
+
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby->start;
+
+my $test_db = 'bp_db';
+$node_primary->safe_psql('postgres', "CREATE DATABASE $test_db");
+
+my $table = 'bp_table';
+$node_primary->safe_psql(
+	$test_db, qq[
+CREATE TABLE $table (a int, b int);
+INSERT INTO $table SELECT 0, 0 FROM generate_series(1, 20);
+]);
+
+# Create dead tuples that VACUUM FREEZE will prune; this is what
+# triggers the buffer pin recovery conflict on the standby.
+$node_primary->safe_psql(
+	$test_db, qq[
+BEGIN;
+INSERT INTO $table VALUES (1, 0);
+ROLLBACK;
+BEGIN; LOCK $table; COMMIT;
+]);
+$node_primary->wait_for_replay_catchup($node_standby);
+
+my $psql_standby =
+  $node_standby->background_psql($test_db, on_error_stop => 0);
+
+my $cursor = 'bp_cursor';
+my $res = $psql_standby->query_safe(
+	qq[
+BEGIN;
+DECLARE $cursor CURSOR FOR SELECT b FROM $table;
+FETCH FORWARD FROM $cursor;
+]);
+like($res, qr/^0$/m, 'standby cursor pins buffer');
+
+my $log_location = -s $node_standby->logfile;
+
+# Trigger the conflict.
+$node_primary->safe_psql($test_db, "VACUUM FREEZE $table;");
+
+# Wait for the "still waiting" log. Bound the wait below
+# max_standby_streaming_delay (5s) so a regression where the log only
+# appears at the streaming-delay boundary fails fast.
+my $deadline = time() + 7;
+my $found = 0;
+my $elapsed_ms;
+while (time() < $deadline)
+{
+	my $log = PostgreSQL::Test::Utils::slurp_file($node_standby->logfile,
+		$log_location);
+	if ($log =~ /recovery still waiting after (\d+)\.(\d+) ms: .*buffer pin/)
+	{
+		$elapsed_ms = $1 + 0;
+		$found = 1;
+		last;
+	}
+	usleep(50_000);    # 50ms poll
+}
+
+ok($found,
+	"buffer pin conflict 'still waiting' log appears before "
+	  . "max_standby_streaming_delay");
+
+if ($found)
+{
+	# The log should appear close to deadlock_timeout (100ms), not
+	# anywhere near max_standby_streaming_delay (5000ms).  The v2
+	# regression manifests as the log appearing at approximately 5001ms.
+	# Use a 4000ms upper bound: tight enough to catch the regression
+	# (which fires at 5001ms) but loose enough to absorb scheduling
+	# noise on slow buildfarm animals (valgrind, CLOBBER_CACHE_ALWAYS,
+	# ASan, etc.) where the typical-case latency of ~100ms can grow
+	# 10x.
+	cmp_ok($elapsed_ms, '<', 4000,
+		"'still waiting' elapsed ($elapsed_ms ms) is near "
+		  . "deadlock_timeout, not max_standby_streaming_delay");
+}
+
+$psql_standby->reconnect_and_clear();
+
+# The standby session was terminated by the conflict cancel.  Wait for
+# replay to catch up before stopping the cluster, mirroring the pattern
+# in t/031_recovery_conflict.pl; without this, shutdown can race the
+# FATAL message and leave the test output flaky on slow buildfarm
+# animals.
+$node_primary->wait_for_replay_catchup($node_standby);
+
+$node_standby->stop;
+$node_primary->stop;
+
+done_testing();
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-06-03 10:22  Ilmar Yunusov <[email protected]>
  parent: JoongHyuk Shin <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Ilmar Yunusov @ 2026-06-03 10:22 UTC (permalink / raw)
  To: [email protected]; +Cc: JoongHyuk Shin <[email protected]>

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hi,

I looked at v3 again, this time on Linux, focusing on the repeated SIGUSR1
behavior and the log_recovery_conflict_waits timing issue I reported for v2.

I used the v3 attachment from JoongHyuk's 2026-06-03 message, on
origin/master at f2081a7800f1696cb0415bacd655cb41b7b9ca63.

The patch applies cleanly with git am, and git diff --check reports no
issues.

I built with:

./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-tap-tests
make -s -j3
make -s install

That passed.

The new targeted TAP test passed:

make -C src/test/recovery check PROVE_TESTS=t/054_bufferpin_conflict_log_timing.pl

Result:

t/054_bufferpin_conflict_log_timing.pl .. ok
All tests successful.
Files=1, Tests=3
Result: PASS

I also ran the full recovery TAP suite:

make -C src/test/recovery check

That passed too:

All tests successful.
Files=53, Tests=633
Result: PASS

Six tests were skipped because injection points were not supported by this
build.

For the signal behavior, I ran the same buffer-pin conflict reproducer under
strace on the standby postmaster and its children:

strace -ff -qq -e trace=kill,tgkill,tkill

The count below is for kill/tgkill/tkill(..., SIGUSR1) syscalls during the
conflict window, after subtracting signals already seen before VACUUM FREEZE.

On unpatched master:

sigusr1_delta=51
recovery still waiting after 100.442 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.455 ms: recovery conflict on buffer pin

With v3:

sigusr1_delta=2
recovery still waiting after 100.479 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.778 ms: recovery conflict on buffer pin

I interpret the two v3 SIGUSR1 syscalls as the one deadlock-check signal and
the final cancellation signal at max_standby_streaming_delay. So in this
repro, v3 removes the repeated deadlock-check signals every deadlock_timeout,
while keeping the "recovery still waiting" log near deadlock_timeout.

I did not find a new issue in the checked path.

I have not reviewed the backpatching question, and I did not run
installcheck-world.

Regards,
Ilmar Yunusov

The new status of this patch is: Ready for Committer

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

end of thread, other threads:[~2026-06-03 10:22 UTC | newest]

Thread overview: 11+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <[email protected]>
2026-04-21 05:42 ` Fujii Masao <[email protected]>
2026-04-21 05:55   ` Michael Paquier <[email protected]>
2026-04-27 11:07     ` JoongHyuk Shin <[email protected]>
2026-05-29 21:08       ` Zsolt Parragi <[email protected]>
2026-05-30 11:51         ` Ilmar Y <[email protected]>
2026-05-18 20:32     ` Álvaro Herrera <[email protected]>
2026-05-22 08:41       ` JoongHyuk Shin <[email protected]>
2026-05-29 06:30         ` Michael Paquier <[email protected]>
2026-06-03 08:27           ` JoongHyuk Shin <[email protected]>
2026-06-03 10:22             ` Ilmar Yunusov <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox