public inbox for [email protected]  
help / color / mirror / Atom feed
[PATCH] Fix premature timeout in pg_promote() caused by signal interruptions
4+ messages / 2 participants
[nested] [flat]

* [PATCH] Fix premature timeout in pg_promote() caused by signal interruptions
@ 2026-03-11 16:44  Robert Pang <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Robert Pang @ 2026-03-11 16:44 UTC (permalink / raw)
  To: pgsql-hackers

Hi all,

We have observed an issue where pg_promote() returns false and issues
a timeout warning prematurely, even if the standby server is
successfully promoted later within the specified timeout period.

Problem Description

The current implementation of pg_promote() calculates a fixed number
of loop iterations based on the timeout value, assuming each loop
waits exactly 100 ms for the backend latch. However, if the backend
receives an unrelated signal (e.g., from
client_connection_check_interval), it wakes up early. These repeated,
unrelated wakeups cause the loop counter to deplete much faster than
intended, leading to a premature timeout.

Reproduction

Set up a standby server while modifying pg_promote not to write the
promote file to block the promotion. And by setting
client_connection_check_interval = 1, we can consistently trigger a
premature timeout. In the example below, a 10-second timeout expires
in roughly 107 ms:

postgres=# set client_connection_check_interval=1;
SET
postgres=# \timing
Timing is on.
postgres=# select pg_promote(true, 10);
WARNING:  server did not promote within 10 seconds
┌────────────┐
│ pg_promote │
├────────────┤
│ f          │
└────────────┘
(1 row)

Time: 107.783 ms

Proposed Fix

The attached patch modifies the logic to loop based on the actual
elapsed time rather than a fixed number of iterations. This ensures
that pg_promote() respects the specified timeout regardless of how
many times the backend latch is signaled.

After applying the patch, the timeout behaves as expected:

postgres=# set client_connection_check_interval=1;
SET
postgres=# \timing
Timing is on.
postgres=# select pg_promote(true, 10);
WARNING:  server did not promote within 10 seconds
┌────────────┐
│ pg_promote │
├────────────┤
│ f          │
└────────────┘
(1 row)

Time: 10000.865 ms (00:10.001)

We would like to submit this patch for the community's consideration.

Best regards
Robert Pang
Google


Attachments:

  [application/x-patch] 0001-Fix-premature-timeout-in-pg_promote-caused-by-signal.patch (1.9K, 2-0001-Fix-premature-timeout-in-pg_promote-caused-by-signal.patch)
  download | inline diff:
From 2075aecdcc6a70fea1b33c9e90878e90c2a61612 Mon Sep 17 00:00:00 2001
From: Robert Pang <[email protected]>
Date: Tue, 10 Mar 2026 21:11:49 -0700
Subject: [PATCH] Fix premature timeout in pg_promote() caused by signal
 interruptions

Previously, pg_promote() looped a fixed number of times calculated from the
specified timeout and waited 100 ms on the backend latch per iteration for
standby promotion. However, unrelated signals to the backend could set the
latch and wake up the backend early, resulting in a wait time significantly
shorter than the specified timeout if the signals happened frequently.

This commit refines the logic to track actual elapsed time. By looping until the
requested duration has truly passed, pg_promote() now ensures that the wait does
not end prematurely due to signals.

Author: Robert Pang
---
 src/backend/access/transam/xlogfuncs.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index ecb3c8c0820..6137af2d35e 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -690,7 +690,7 @@ pg_promote(PG_FUNCTION_ARGS)
 	bool		wait = PG_GETARG_BOOL(0);
 	int			wait_seconds = PG_GETARG_INT32(1);
 	FILE	   *promote_file;
-	int			i;
+	TimestampTz	end_time;
 
 	if (!RecoveryInProgress())
 		ereport(ERROR,
@@ -731,8 +731,8 @@ pg_promote(PG_FUNCTION_ARGS)
 		PG_RETURN_BOOL(true);
 
 	/* wait for the amount of time wanted until promotion */
-#define WAITS_PER_SECOND 10
-	for (i = 0; i < WAITS_PER_SECOND * wait_seconds; i++)
+	end_time = GetCurrentTimestamp() + wait_seconds * 1000000L;
+	while (GetCurrentTimestamp() < end_time)
 	{
 		int			rc;
 
@@ -745,7 +745,7 @@ pg_promote(PG_FUNCTION_ARGS)
 
 		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
-					   1000L / WAITS_PER_SECOND,
+					   100L,
 					   WAIT_EVENT_PROMOTE);
 
 		/*
-- 
2.53.0.473.g4a7958ca14-goog



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix premature timeout in pg_promote() caused by signal interruptions
@ 2026-03-25 02:28  Michael Paquier <[email protected]>
  parent: Robert Pang <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Michael Paquier @ 2026-03-25 02:28 UTC (permalink / raw)
  To: Robert Pang <[email protected]>; +Cc: pgsql-hackers

On Wed, Mar 11, 2026 at 09:44:07AM -0700, Robert Pang wrote:
> The current implementation of pg_promote() calculates a fixed number
> of loop iterations based on the timeout value, assuming each loop
> waits exactly 100 ms for the backend latch. However, if the backend
> receives an unrelated signal (e.g., from
> client_connection_check_interval), it wakes up early. These repeated,
> unrelated wakeups cause the loop counter to deplete much faster than
> intended, leading to a premature timeout.

It is true that we can do better here, and your proposal about having
a more precise timeout calculation looks like a sensible improvement
for this case.

No objections regarding your patch.  I would like to apply it on HEAD,
if there are no objections.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix premature timeout in pg_promote() caused by signal interruptions
@ 2026-03-26 04:05  Robert Pang <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Robert Pang @ 2026-03-26 04:05 UTC (permalink / raw)
  To: getiancheng_2012 <[email protected]>; +Cc: Michael Paquier <[email protected]>; pgsql-hackers

Hi Michael and Tiancheng

On Tue, Mar 24, 2026 at 8:06 PM getiancheng_2012 <[email protected]>
wrote:

> ---- Replied Message ----
> From Michael Paquier<[email protected]> <[email protected]>
> Date 3/25/2026 10:28
> To Robert Pang<[email protected]> <[email protected]>
> Cc pgsql-hackers<[email protected]>
> <[email protected]>
> Subject Re: [PATCH] Fix premature timeout in pg_promote() caused by
> signal interruptions
>
> On Wed, Mar 11, 2026 at 09:44:07AM -0700, Robert Pang wrote:
>
> The current implementation of pg_promote() calculates a fixed number
>  of loop iterations based on the timeout value, assuming each loop
>  waits exactly 100 ms for the backend latch. However, if the backend
>  receives an unrelated signal (e.g., from
>  client_connection_check_interval), it wakes up early. These repeated,
>  unrelated wakeups cause the loop counter to deplete much faster than
>  intended, leading to a premature timeout.
>
>
> It is true that we can do better here, and your proposal about having
> a more precise timeout calculation looks like a sensible improvement
> for this case.
>
> No objections regarding your patch.  I would like to apply it on HEAD,
> if there are no objections.
> --
> Michael
>
>
> Hi all,
>
> Overall LGTM. Just a small comment:
>
> "+  end_time = GetCurrentTimestamp() + wait_seconds * 1000000L;"
>
> I think we can use TimestampTzPlusSeconds(GetCurrentTimestamp(), wait_seconds).
>
>
> Best regards
>
> Tiancheng Ge
>
>
Thank you for reviewing this patch. The use of TimestampTzPlusSeconds()
will be good.

Best regards
Robert Pang


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix premature timeout in pg_promote() caused by signal interruptions
@ 2026-03-26 04:39  Michael Paquier <[email protected]>
  parent: Robert Pang <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Michael Paquier @ 2026-03-26 04:39 UTC (permalink / raw)
  To: Robert Pang <[email protected]>; +Cc: getiancheng_2012 <[email protected]>; pgsql-hackers

On Wed, Mar 25, 2026 at 09:05:34PM -0700, Robert Pang wrote:
> Thank you for reviewing this patch. The use of TimestampTzPlusSeconds()
> will be good.

Yep.  Included the suggestion and applied the patch on HEAD.  Thanks,
all.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2026-03-26 04:39 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-11 16:44 [PATCH] Fix premature timeout in pg_promote() caused by signal interruptions Robert Pang <[email protected]>
2026-03-25 02:28 ` Michael Paquier <[email protected]>
2026-03-26 04:05   ` Robert Pang <[email protected]>
2026-03-26 04:39     ` Michael Paquier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox