Re: client_connection_check_interval default value

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: client_connection_check_interval default value
22+ messages / 9 participants
[nested] [flat]

* Re: client_connection_check_interval default value
@ 2026-02-05 23:42  Fujii Masao <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Fujii Masao @ 2026-02-05 23:42 UTC (permalink / raw)
  To: Jeremy Schneider <[email protected]>; +Cc: Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider
<[email protected]> wrote:
>
> One interesting thing to me - it seems like all of the past mail
> threads were focused on a situation different from mine. Lots of
> discussion about freeing resources like CPU.
>
> In the outage I saw, the system was idle and we completely ran out of
> max_connections because all sessions were waiting on a row lock.
>
> Importantly, the app was closing these conns but we had sockets stacking
> up on the server in CLOSE-WAIT state - and postgres simply never
> cleaned them up until we had an outage. The processes were completely
> idle waiting for a row lock that was not going to be released.
>
> Impact could have been isolated to sessions hitting that row (with this
> GUC), but it escalated to a system outage. It's pretty simple to
> reproduce this:
> https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion
>
>
> On Thu, 5 Feb 2026 09:26:34 -0800
> Jacob Champion <[email protected]> wrote:
>
> > On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider
> > <[email protected]> wrote:
> > > While a fix has been merged in pgx for the most direct root cause of
> > > the incident I saw, this setting just seems like a good behavior to
> > > make Postgres more robust in general.
> >
> > At the risk of making perfect the enemy of better, the protocol-level
> > heartbeat mentioned in the original thread [1] would cover more use
> > cases, which might give it a better chance of eventually becoming
> > default behavior. It might also be a lot of work, though.
>
> It seems like a fair bit of discussion is around OS coverage - even
> Thomas' message there references keepalive working as expected on
> Linux. Tom objects in 2023 that "the default behavior would then be
> platform-dependent and that's a documentation problem we could do
> without."
>
> But it's been five years - has there been further work on implementing
> a postgres-level heartbeat? And I see other places in the docs where we
> note platform differences, is it really such a big problem to change
> the default here?
>
>
> On Thu, 5 Feb 2026 10:00:29 -0500
> Greg Sabino Mullane <[email protected]> wrote:
>
> > I'm a weak -1 on this. Certainly not 2s! That's a lot of context
> > switching for a busy system for no real reason. Also see this past
> > discussion:
>
> In the other thread I see larger perf concerns with some early
> implementations before they refactored the patch? Konstantin's message
> on 2019-08-02 said he didn't see much difference, and the value of the
> timeout didn't seem to matter, and if anything the marginal effect was
> simply from the presence of any timer (same effect as setting
> statement_timeout) - and later on the thread it seems like Thomas also
> saw minimal performance concern here.
>
> I did see a real system outage that could have been prevented by an
> appropriate default value here, since I didn't yet know to change it.

I'm not sure that client_connection_check_interval needs to be enabled
by default. However, if we do agree to change the default and apply it,
I think we should first address the related issue: with log_lock_waits enabled
by default, setting client_connection_check_interval to 2s would cause
"still waiting" messages to be logged every 2 seconds during waiting on
the lock. That could result in a lot of noisy logging under default settings.

The issue is that backends blocked in ProcSleep() are woken up every
client_connection_check_interval and may emit a "still waiting" message
each time if log_lock_waits is enabled. To mitigate this, just one idea is
to add a flag to track whether the "still waiting" message has already been
emitted during a call to ProcSleep(), and suppress further messages
once it has been logged.

Regards,

-- 
Fujii Masao






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-06 00:01  Tom Lane <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 4 replies; 22+ messages in thread

From: Tom Lane @ 2026-02-06 00:01 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Jeremy Schneider <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

Fujii Masao <[email protected]> writes:
> On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider
> <[email protected]> wrote:
>> I did see a real system outage that could have been prevented by an
>> appropriate default value here, since I didn't yet know to change it.

> I'm not sure that client_connection_check_interval needs to be enabled
> by default.

I think enabling it by default is a nonstarter, because it changes
behavior in a significant way.  Specifically, it's always been the
case that if the client disconnects during a non-SELECT query (or
anything that doesn't produce output), the backend would complete that
query before ending the session.  I think it's very likely that there
are users depending on that behavior.  Jeremy is describing an
application that evidently was built on the assumption that
disconnecting early would abort a wait, but that assumption was not
based on any testing.  I think it's good that we now have an option
to make such an assumption hold, but that does not translate to
"we should force that behavior on everybody".  Whether or not the
overhead is insignificant, there's a good chance that the change
would make more people unhappy than happy.

> The issue is that backends blocked in ProcSleep() are woken up every
> client_connection_check_interval and may emit a "still waiting" message
> each time if log_lock_waits is enabled. To mitigate this, just one idea is
> to add a flag to track whether the "still waiting" message has already been
> emitted during a call to ProcSleep(), and suppress further messages
> once it has been logged.

Independently of what's the default, it seems like it'd be valuable to
make that interaction better.  I think it is reasonable to keep on
emitting "still waiting" every so often, but we could probably
rate-limit that to a lot less than every 2 seconds.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-06 00:18  Jacob Champion <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 1 reply; 22+ messages in thread

From: Jacob Champion @ 2026-02-06 00:18 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jeremy Schneider <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Thu, Feb 5, 2026 at 4:01 PM Tom Lane <[email protected]> wrote:
> I think enabling it by default is a nonstarter, because it changes
> behavior in a significant way.  Specifically, it's always been the
> case that if the client disconnects during a non-SELECT query (or
> anything that doesn't produce output), the backend would complete that
> query before ending the session.

Ha, I hadn't even thought about the possibility of fire-and-forget...

> I think it's very likely that there
> are users depending on that behavior.

From a quick search, yeah, probably:

    https://dba.stackexchange.com/questions/247206/will-query-continue-to-run-even-after-network-times-o...

--Jacob






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-06 01:02  Jeremy Schneider <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 0 replies; 22+ messages in thread

From: Jeremy Schneider @ 2026-02-06 01:02 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Thu, 05 Feb 2026 19:01:52 -0500
Tom Lane <[email protected]> wrote:

> I think enabling it by default is a nonstarter, because it changes
> behavior in a significant way.  Specifically, it's always been the
> case that if the client disconnects during a non-SELECT query (or
> anything that doesn't produce output), the backend would complete that
> query before ending the session.  I think it's very likely that there
> are users depending on that behavior.  Jeremy is describing an
> application that evidently was built on the assumption that
> disconnecting early would abort a wait, but that assumption was not
> based on any testing.  I think it's good that we now have an option
> to make such an assumption hold, but that does not translate to
> "we should force that behavior on everybody".  Whether or not the
> overhead is insignificant, there's a good chance that the change
> would make more people unhappy than happy.

This application was trying to kill the connection, the postgres client
attempted to send a cancel message but the cancel message was lost and
the client naturally falls back to just hard killing the network
connection.

I totally understand the cancel messages should not have gotten lost.

But I'm surprised that we'd assume most users who kill -9 their hung
client want their 10 hour query to keep running forever? Yes we always
try to gracefully stop things first and let cancel messages get
through, but I'd think if server received a FIN over the network then we
could interpret that similar to a cancel message. This would be a
change in a new major version; we wouldn't backpatch a GUC default.

I see the concern around the behavior change, but I don't feel like
this difference between CTRL-C and kill-9 should be sancrosact, if
having them behave similarly seems generally reasonable. Postgres has
made more significant changes than this before in major versions.

-Jeremy

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-06 01:36  Thomas Munro <[email protected]>
  parent: Jacob Champion <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Thomas Munro @ 2026-02-06 01:36 UTC (permalink / raw)
  To: Jacob Champion <[email protected]>; +Cc: Tom Lane <[email protected]>; Fujii Masao <[email protected]>; Jeremy Schneider <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>

Some space cadet thoughts I recall wondering about back then: If one
process had all the sockets, I suppose something could watch for
POLLRDHUP/KEV_EOF from all of them and send an interrupt to the owner
to handle in its next CFI, so that we don't have to have every backend
polling its socket.  Once we have a multithreaded server we could try
that, but it might become more reasonable to consider with
hypothetical models of socket management where there are executor
threads that are not tried to a session and socket, which implies a
new bit of architecture handling network events on behalf of sessions.
I suppose in theory it could be the postmaster today (just don't close
the client sockets after fork, and teach the postmaster's main loop to
do that), and I vaguely recall something along those lines being shot
down in some other thread... it might cause some extra contention in
the kernel depending how it's implemented (IDK), but it'd obviously
eat a lot of fd table entries which is probably what killed the
notion... though we'll need that large socket table eventually (and
perhaps we'll pay for it by sharing file fds somehow...).  But either way,
heartbeats might indeed make more sense, whatever the architecture.

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-06 09:40  Laurenz Albe <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 0 replies; 22+ messages in thread

From: Laurenz Albe @ 2026-02-06 09:40 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; Fujii Masao <[email protected]>; +Cc: Jeremy Schneider <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Thu, 2026-02-05 at 19:01 -0500, Tom Lane wrote:
> Fujii Masao <[email protected]> writes:
> > On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider
> > <[email protected]> wrote:
> > > I did see a real system outage that could have been prevented by an
> > > appropriate default value here, since I didn't yet know to change it.
> 
> > I'm not sure that client_connection_check_interval needs to be enabled
> > by default.
> 
> I think enabling it by default is a nonstarter, because it changes
> behavior in a significant way.  Specifically, it's always been the
> case that if the client disconnects during a non-SELECT query (or
> anything that doesn't produce output), the backend would complete that
> query before ending the session.  I think it's very likely that there
> are users depending on that behavior.

*Perhaps* there are some users who depend on the current behavior, but
my experience is that the vast majority of users don't want that statements
started by a connection that went dead should keep running.

I mean, it would be a change in behavior, but that is normal during a
major upgrade, and users who actively want the current behavior can
disable client_connection_check_interval.

I think that enabling client_connection_check_interval would be a net win,
as far as the core functionality is concerned.

Fujii Masao's concern that log_lock_waits would issue a message every
client_connection_check_interval is much more serious in my opinion, now
that log_lock_waits is enabled by default (at - erm - my insistence).

Why does the deadlock detector kick in every client_connection_check_interval?

Yours,
Laurenz Albe

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-18 05:30  Fujii Masao <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 2 replies; 22+ messages in thread

From: Fujii Masao @ 2026-02-18 05:30 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Jeremy Schneider <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <[email protected]> wrote:
> > The issue is that backends blocked in ProcSleep() are woken up every
> > client_connection_check_interval and may emit a "still waiting" message
> > each time if log_lock_waits is enabled. To mitigate this, just one idea is
> > to add a flag to track whether the "still waiting" message has already been
> > emitted during a call to ProcSleep(), and suppress further messages
> > once it has been logged.
>
> Independently of what's the default, it seems like it'd be valuable to
> make that interaction better.  I think it is reasonable to keep on
> emitting "still waiting" every so often, but we could probably
> rate-limit that to a lot less than every 2 seconds.

Attached is a patch that rate-limits the "still waiting on lock" message
to at most once every 10s.

I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
discuss the appropriate interval and adjust it if needed. The value is
currently hard-coded, as making it configurable does not seem necessary.

Thoughts?

-- 
Fujii Masao


Attachments:

  [application/octet-stream] v1-0001-Rate-limit-repeated-still-waiting-on-lock-log-mes.patch (3.8K, 2-v1-0001-Rate-limit-repeated-still-waiting-on-lock-log-mes.patch)
  download | inline diff:
From aa9e5748f055b9ca76eff9a5e57f7cbfc01ca4aa Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Wed, 18 Feb 2026 00:58:02 +0900
Subject: [PATCH v1] Rate-limit repeated "still waiting on lock" log messages.

When log_lock_waits is enabled, the "still waiting on lock" message is normally
emitted only once while a session continues waiting. However, if the wait is
interrupted, for example by wakeups from client_connection_check_interval,
SIGHUP for configuration reloads, or similar events, the message could be
emitted again each time the wait resumes.

For example, with very small client_connection_check_interval values
(e.g., 100 ms), this behavior could flood the logs with repeated messages,
making them difficult to use.

To prevent this, this commit rate-limits the "still waiting on lock" message to
at most once every 10 seconds. The interval is now hardcoded, as making it
configurable is not considered necessary.
---
 src/backend/storage/lmgr/proc.c | 36 +++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index fd8318bdf3d..10f612549f6 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1305,6 +1305,7 @@ ProcSleep(LOCALLOCK *locallock)
 	TimestampTz standbyWaitStart = 0;
 	bool		allow_autovacuum_cancel = true;
 	bool		logged_recovery_conflict = false;
+	TimestampTz last_logged = 0;
 	ProcWaitStatus myWaitStatus;
 	DeadLockState deadlock_state;
 
@@ -1553,6 +1554,7 @@ ProcSleep(LOCALLOCK *locallock)
 						lock_waiters_sbuf,
 						lock_holders_sbuf;
 			const char *modename;
+			TimestampTz now;
 			long		secs;
 			int			usecs;
 			long		msecs;
@@ -1565,9 +1567,9 @@ ProcSleep(LOCALLOCK *locallock)
 			DescribeLockTag(&buf, &locallock->tag.lock);
 			modename = GetLockmodeName(locallock->tag.lock.locktag_lockmethodid,
 									   lockmode);
+			now = GetCurrentTimestamp();
 			TimestampDifference(get_timeout_start_time(DEADLOCK_TIMEOUT),
-								GetCurrentTimestamp(),
-								&secs, &usecs);
+								now, &secs, &usecs);
 			msecs = secs * 1000 + usecs / 1000;
 			usecs = usecs % 1000;
 
@@ -1602,12 +1604,30 @@ ProcSleep(LOCALLOCK *locallock)
 			}
 
 			if (myWaitStatus == PROC_WAIT_STATUS_WAITING)
-				ereport(LOG,
-						(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
-								MyProcPid, modename, buf.data, msecs, usecs),
-						 (errdetail_log_plural("Process holding the lock: %s. Wait queue: %s.",
-											   "Processes holding the lock: %s. Wait queue: %s.",
-											   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data))));
+			{
+				/*
+				 * Rate-limit the "still waiting on lock" log message to at
+				 * most once every 10 seconds. XXX: This interval is
+				 * hardcoded, as making it configurable is unnecessary.
+				 *
+				 * Without this limit, the message can be emitted whenever the
+				 * lock-wait sleep is interrupted (for example by SIGHUP for
+				 * config reload or by client_connection_check_interval). For
+				 * example, if client_connection_check_interval is set very
+				 * low (e.g., 100 ms), the message could be logged repeatedly,
+				 * flooding the log and making it difficult to use.
+				 */
+				if (TimestampDifferenceExceeds(last_logged, now, 10000))
+				{
+					ereport(LOG,
+							(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
+									MyProcPid, modename, buf.data, msecs, usecs),
+							 (errdetail_log_plural("Process holding the lock: %s. Wait queue: %s.",
+												   "Processes holding the lock: %s. Wait queue: %s.",
+												   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data))));
+					last_logged = now;
+				}
+			}
 			else if (myWaitStatus == PROC_WAIT_STATUS_OK)
 				ereport(LOG,
 						(errmsg("process %d acquired %s on %s after %ld.%03d ms",
-- 
2.51.2



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-18 08:03  Laurenz Albe <[email protected]>
  parent: Fujii Masao <[email protected]>
  1 sibling, 1 reply; 22+ messages in thread

From: Laurenz Albe @ 2026-02-18 08:03 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; Tom Lane <[email protected]>; +Cc: Jeremy Schneider <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Wed, 2026-02-18 at 14:30 +0900, Fujii Masao wrote:
> On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <[email protected]> wrote:
> > > The issue is that backends blocked in ProcSleep() are woken up every
> > > client_connection_check_interval and may emit a "still waiting" message
> > > each time if log_lock_waits is enabled. To mitigate this, just one idea is
> > > to add a flag to track whether the "still waiting" message has already been
> > > emitted during a call to ProcSleep(), and suppress further messages
> > > once it has been logged.
> > 
> > Independently of what's the default, it seems like it'd be valuable to
> > make that interaction better.  I think it is reasonable to keep on
> > emitting "still waiting" every so often, but we could probably
> > rate-limit that to a lot less than every 2 seconds.
> 
> Attached is a patch that rate-limits the "still waiting on lock" message
> to at most once every 10s.
> 
> I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
> discuss the appropriate interval and adjust it if needed. The value is
> currently hard-coded, as making it configurable does not seem necessary.

I think that 10 seconds is good.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-18 09:00  Ants Aasma <[email protected]>
  parent: Laurenz Albe <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Ants Aasma @ 2026-02-18 09:00 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; +Cc: Fujii Masao <[email protected]>; Tom Lane <[email protected]>; Jeremy Schneider <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Wed, 18 Feb 2026 at 10:03, Laurenz Albe <[email protected]> wrote:
> On Wed, 2026-02-18 at 14:30 +0900, Fujii Masao wrote:
> > On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <[email protected]> wrote:
> > > > The issue is that backends blocked in ProcSleep() are woken up every
> > > > client_connection_check_interval and may emit a "still waiting" message
> > > > each time if log_lock_waits is enabled. To mitigate this, just one idea is
> > > > to add a flag to track whether the "still waiting" message has already been
> > > > emitted during a call to ProcSleep(), and suppress further messages
> > > > once it has been logged.
> > >
> > > Independently of what's the default, it seems like it'd be valuable to
> > > make that interaction better.  I think it is reasonable to keep on
> > > emitting "still waiting" every so often, but we could probably
> > > rate-limit that to a lot less than every 2 seconds.
> >
> > Attached is a patch that rate-limits the "still waiting on lock" message
> > to at most once every 10s.
> >
> > I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
> > discuss the appropriate interval and adjust it if needed. The value is
> > currently hard-coded, as making it configurable does not seem necessary.
>
> I think that 10 seconds is good.

I think 10 seconds is way too small. Having one long locker blocking a
couple hundred backends is something that happens somewhat regularly.
A 10s interval would result in tens of "still waiting" per second. It
will just make it harder to sift out what is actually going on between
all the waiters squawking "are we there yet?" in a loop.

I think something above 5 minutes would be more appropriate.

Regards,
Ants Aasma






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-24 06:01  Chao Li <[email protected]>
  parent: Fujii Masao <[email protected]>
  1 sibling, 0 replies; 22+ messages in thread

From: Chao Li @ 2026-02-24 06:01 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Tom Lane <[email protected]>; Jeremy Schneider <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>



> On Feb 18, 2026, at 13:30, Fujii Masao <[email protected]> wrote:
> 
> On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <[email protected]> wrote:
>>> The issue is that backends blocked in ProcSleep() are woken up every
>>> client_connection_check_interval and may emit a "still waiting" message
>>> each time if log_lock_waits is enabled. To mitigate this, just one idea is
>>> to add a flag to track whether the "still waiting" message has already been
>>> emitted during a call to ProcSleep(), and suppress further messages
>>> once it has been logged.
>> 
>> Independently of what's the default, it seems like it'd be valuable to
>> make that interaction better.  I think it is reasonable to keep on
>> emitting "still waiting" every so often, but we could probably
>> rate-limit that to a lot less than every 2 seconds.
> 
> Attached is a patch that rate-limits the "still waiting on lock" message
> to at most once every 10s.
> 
> I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
> discuss the appropriate interval and adjust it if needed. The value is
> currently hard-coded, as making it configurable does not seem necessary.
> 
> Thoughts?
> 
> -- 
> Fujii Masao
> <v1-0001-Rate-limit-repeated-still-waiting-on-lock-log-mes.patch>

I feel 10 seconds is good.

The other thinking is that, the message will only be printed after the first deadlock check is fired. So, if someone sets deadlock_timeout to a large value, say 30 or 60 seconds, then any waiting log would already be very delayed. In that case, the user might not want to log more often than deadlock checks anyway. From this perspective, the rate limit timeout could be max(10s, deadlock_timeout). Anyway, this is not a strong opinion.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/










^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-02-26 01:30  Fujii Masao <[email protected]>
  parent: Ants Aasma <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Fujii Masao @ 2026-02-26 01:30 UTC (permalink / raw)
  To: Ants Aasma <[email protected]>; +Cc: Laurenz Albe <[email protected]>; Tom Lane <[email protected]>; Jeremy Schneider <[email protected]>; Jacob Champion <[email protected]>; [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]>

On Wed, Feb 18, 2026 at 6:00 PM Ants Aasma <[email protected]> wrote:
> I think 10 seconds is way too small. Having one long locker blocking a
> couple hundred backends is something that happens somewhat regularly.
> A 10s interval would result in tens of "still waiting" per second. It
> will just make it harder to sift out what is actually going on between
> all the waiters squawking "are we there yet?" in a loop.
>
> I think something above 5 minutes would be more appropriate.

For that scenario, wouldn't it be better to emit the "still waiting" message
only once per lock wait (i.e., use the same behavior) regardless of
the client_connection_check_interval setting, rather than repeating it
every several minutes? Also logging it again after a long delay like 5min
could be confusing to users.

I used 10s in the patch, on the other hand I'm also ok with preserving
the existing behavior (emit once per wait) whether
client_connection_check_interval is set or not. Even without periodic
log messages, ongoing lock waits can still be monitored via pg_locks.

Regards,

-- 
Fujii Masao

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-09 14:12  Fujii Masao <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 2 replies; 22+ messages in thread

From: Fujii Masao @ 2026-03-09 14:12 UTC (permalink / raw)
  To: Hüseyin Demir <[email protected]>; +Cc: [email protected]

On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <[email protected]> wrote:
>
> Hi Fujii,
>
> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
>
> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If someone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback. Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent.

Or perhaps they expect the log message to be emitted only once,
just after deadlock_timeout, similar to the current behavior when
client_connection_check_interval is not set, I guess.

I'm now starting thinking it might be better to preserve the existing
behavior (emitting the message once per wait) regardless of whether
client_connection_check_interval is set, and implement that first.

If there is a need to emit the message periodically, we could add that
as a separate feature later so that it works independently of
the client_connection_check_interval setting.

Thought?

Regards,

-- 
Fujii Masao

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-09 18:18  Hüseyin Demir <[email protected]>
  parent: Fujii Masao <[email protected]>
  1 sibling, 0 replies; 22+ messages in thread

From: Hüseyin Demir @ 2026-03-09 18:18 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: [email protected]

Hi,

Fujii Masao <[email protected]>, 9 Mar 2026 Pzt, 15:12 tarihinde şunu
yazdı:

> Or perhaps they expect the log message to be emitted only once,
> just after deadlock_timeout, similar to the current behavior when
> client_connection_check_interval is not set, I guess.
>
> I'm now starting thinking it might be better to preserve the existing
> behavior (emitting the message once per wait) regardless of whether
> client_connection_check_interval is set, and implement that first.
>
> If there is a need to emit the message periodically, we could add that
> as a separate feature later so that it works independently of
> the client_connection_check_interval setting.
>
>
+1 to this idea. It would be a better approach in the future if we need to
change the behaviour of emitting logs about these topics.

I do see the trade-off. Put simply with only one message, we can lose
visibility into long lock waits. But I think that's a separate concern. If
there's a real need for periodic "still waiting" messages in the future, we
could introduce a dedicated GUC (something like log_lock_waits_interval) or
even a simple constant to control that independently of
client_connection_check_interval. That way deadlock detection, connection
checking, and lock-wait logging each have their own rules and don't
interfere with each other.

Regards.

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-13 12:36  Fujii Masao <[email protected]>
  parent: Fujii Masao <[email protected]>
  1 sibling, 2 replies; 22+ messages in thread

From: Fujii Masao @ 2026-03-13 12:36 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Hüseyin Demir <[email protected]>; [email protected]

On Tue, Mar 10, 2026 at 10:42 AM Chao Li <[email protected]> wrote:
>
>
>
> > On Mar 9, 2026, at 22:12, Fujii Masao <[email protected]> wrote:
> >
> > On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <[email protected]> wrote:
> >>
> >> Hi Fujii,
> >>
> >> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
> >>
> >> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If someone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback. Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent.
> >
> > Or perhaps they expect the log message to be emitted only once,
> > just after deadlock_timeout, similar to the current behavior when
> > client_connection_check_interval is not set, I guess.
> >
> > I'm now starting thinking it might be better to preserve the existing
> > behavior (emitting the message once per wait) regardless of whether
> > client_connection_check_interval is set, and implement that first.
> >
> > If there is a need to emit the message periodically, we could add that
> > as a separate feature later so that it works independently of
> > the client_connection_check_interval setting.
> >
> > Thought?
>
> Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and log_lock_waitsbelong to two different departments. Even though they cross paths at the implementation level today, having the behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising.

So, attached is a patch that ensures the "still waiting on lock" message is
reported at most once during a lock wait, even if the wait is interrupted.

Regards,

-- 
Fujii Masao


Attachments:

  [application/octet-stream] v2-0001-Ensure-still-waiting-on-lock-message-is-logged-on.patch (3.1K, 2-v2-0001-Ensure-still-waiting-on-lock-message-is-logged-on.patch)
  download | inline diff:
From a3dcbbef374e24ab895e50059cf40fa8a6358dfa Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Fri, 13 Mar 2026 14:37:43 +0900
Subject: [PATCH v2] Ensure "still waiting on lock" message is logged only once
 per wait.

When log_lock_waits is enabled, the "still waiting on lock" message is normally
emitted only once while a session continues waiting. However, if the wait is
interrupted, for example by wakeups from client_connection_check_interval,
SIGHUP for configuration reloads, or similar events, the message could be
emitted again each time the wait resumes.

For example, with very small client_connection_check_interval values
(e.g., 100 ms), this behavior could flood the logs with repeated messages,
making them difficult to use.

To prevent this, this commit guards the "still waiting on lock" message so
it is reported at most once during a lock wait, even if the wait is interrupted.
This preserves the intended behavior when no interrupts occur.
---
 src/backend/storage/lmgr/proc.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d407725e602..5fbc0dfb198 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1309,6 +1309,7 @@ ProcSleep(LOCALLOCK *locallock)
 	TimestampTz standbyWaitStart = 0;
 	bool		allow_autovacuum_cancel = true;
 	bool		logged_recovery_conflict = false;
+	bool		logged_lock_waits = false;
 	ProcWaitStatus myWaitStatus;
 	DeadLockState deadlock_state;
 
@@ -1606,12 +1607,29 @@ ProcSleep(LOCALLOCK *locallock)
 			}
 
 			if (myWaitStatus == PROC_WAIT_STATUS_WAITING)
-				ereport(LOG,
-						(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
-								MyProcPid, modename, buf.data, msecs, usecs),
-						 (errdetail_log_plural("Process holding the lock: %s. Wait queue: %s.",
-											   "Processes holding the lock: %s. Wait queue: %s.",
-											   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data))));
+			{
+				/*
+				 * Guard the "still waiting on lock" log message so it is
+				 * reported at most once while waiting for the lock.
+				 *
+				 * Without this guard, the message can be emitted whenever the
+				 * lock-wait sleep is interrupted (for example by SIGHUP for
+				 * config reload or by client_connection_check_interval). For
+				 * example, if client_connection_check_interval is set very
+				 * low (e.g., 100 ms), the message could be logged repeatedly,
+				 * flooding the log and making it difficult to use.
+				 */
+				if (!logged_lock_waits)
+				{
+					ereport(LOG,
+							(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
+									MyProcPid, modename, buf.data, msecs, usecs),
+							 (errdetail_log_plural("Process holding the lock: %s. Wait queue: %s.",
+												   "Processes holding the lock: %s. Wait queue: %s.",
+												   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data))));
+					logged_lock_waits = true;
+				}
+			}
 			else if (myWaitStatus == PROC_WAIT_STATUS_OK)
 				ereport(LOG,
 						(errmsg("process %d acquired %s on %s after %ld.%03d ms",
-- 
2.51.2



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-16 01:36  Chao Li <[email protected]>
  parent: Fujii Masao <[email protected]>
  1 sibling, 1 reply; 22+ messages in thread

From: Chao Li @ 2026-03-16 01:36 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Hüseyin Demir <[email protected]>; [email protected]



> On Mar 13, 2026, at 20:36, Fujii Masao <[email protected]> wrote:
> 
> On Tue, Mar 10, 2026 at 10:42 AM Chao Li <[email protected]> wrote:
>> 
>> 
>> 
>>> On Mar 9, 2026, at 22:12, Fujii Masao <[email protected]> wrote:
>>> 
>>> On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <[email protected]> wrote:
>>>> 
>>>> Hi Fujii,
>>>> 
>>>> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
>>>> 
>>>> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If someone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback. Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent.
>>> 
>>> Or perhaps they expect the log message to be emitted only once,
>>> just after deadlock_timeout, similar to the current behavior when
>>> client_connection_check_interval is not set, I guess.
>>> 
>>> I'm now starting thinking it might be better to preserve the existing
>>> behavior (emitting the message once per wait) regardless of whether
>>> client_connection_check_interval is set, and implement that first.
>>> 
>>> If there is a need to emit the message periodically, we could add that
>>> as a separate feature later so that it works independently of
>>> the client_connection_check_interval setting.
>>> 
>>> Thought?
>> 
>> Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and log_lock_waitsbelong to two different departments. Even though they cross paths at the implementation level today, having the behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising.
> 
> So, attached is a patch that ensures the "still waiting on lock" message is
> reported at most once during a lock wait, even if the wait is interrupted.
> 
> Regards,
> 
> -- 
> Fujii Masao
> <v2-0001-Ensure-still-waiting-on-lock-message-is-logged-on.patch>

V2 overall looks good to me.

A small comment is about the variable name logged_lock_waits that sounds like “count of waits”, I would suggest “lock_wait_logged”. But I see that the name follows the naming convention of the existing variable logged_recovery_conflict, so maybe just rename to logged_lock_wait (remove the “s”).

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-16 07:04  Hüseyin Demir <[email protected]>
  parent: Fujii Masao <[email protected]>
  1 sibling, 1 reply; 22+ messages in thread

From: Hüseyin Demir @ 2026-03-16 07:04 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Chao Li <[email protected]>; [email protected]

Hi,

Fujii Masao <[email protected]>, 13 Mar 2026 Cum, 13:36 tarihinde
şunu yazdı:
>
> On Tue, Mar 10, 2026 at 10:42 AM Chao Li <[email protected]> wrote:
> >
> >
> >
> > > On Mar 9, 2026, at 22:12, Fujii Masao <[email protected]> wrote:
> > >
> > > On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <[email protected]> wrote:
> > >>
> > >> Hi Fujii,
> > >>
> > >> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
> > >>
> > >> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If someone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback. Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent.
> > >
> > > Or perhaps they expect the log message to be emitted only once,
> > > just after deadlock_timeout, similar to the current behavior when
> > > client_connection_check_interval is not set, I guess.
> > >
> > > I'm now starting thinking it might be better to preserve the existing
> > > behavior (emitting the message once per wait) regardless of whether
> > > client_connection_check_interval is set, and implement that first.
> > >
> > > If there is a need to emit the message periodically, we could add that
> > > as a separate feature later so that it works independently of
> > > the client_connection_check_interval setting.
> > >
> > > Thought?
> >
> > Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and log_lock_waitsbelong to two different departments. Even though they cross paths at the implementation level today, having the behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising.
>
> So, attached is a patch that ensures the "still waiting on lock" message is
> reported at most once during a lock wait, even if the wait is interrupted.
>

The new v2 patch looks good to me.

One open question from my side is should we include a test for this
behaviour ? Because we mentioned adding a different GUC in the future
to manage this rate-limiting approach. It can be useful in the future
once we consider/re-visit this approach. If the tests and other future
ideas can be developed later together we can consider adding tests
later.

Thanks for the patch again!

Regards.





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-16 09:21  Fujii Masao <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Fujii Masao @ 2026-03-16 09:21 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Hüseyin Demir <[email protected]>; [email protected]

On Mon, Mar 16, 2026 at 10:37 AM Chao Li <[email protected]> wrote:
> V2 overall looks good to me.
>
> A small comment is about the variable name logged_lock_waits that sounds like “count of waits”, I would suggest “lock_wait_logged”. But I see that the name follows the naming convention of the existing variable logged_recovery_conflict, so maybe just rename to logged_lock_wait (remove the “s”).
>

TBH I'm not sure whether renaming the variable that way is a good idea,
but I don't have a strong opinion about this. So I renamed it to
logged_lock_wait (i.e., removed the trailing "s").

I've pushed the patch. Thanks!

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-16 09:22  Fujii Masao <[email protected]>
  parent: Hüseyin Demir <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Fujii Masao @ 2026-03-16 09:22 UTC (permalink / raw)
  To: Hüseyin Demir <[email protected]>; +Cc: Chao Li <[email protected]>; [email protected]

On Mon, Mar 16, 2026 at 4:05 PM Hüseyin Demir <[email protected]> wrote:
> The new v2 patch looks good to me.
>
> One open question from my side is should we include a test for this
> behaviour ? Because we mentioned adding a different GUC in the future
> to manage this rate-limiting approach. It can be useful in the future
> once we consider/re-visit this approach. If the tests and other future
> ideas can be developed later together we can consider adding tests
> later.

I agree it's worth adding such tests. From a quick look at the regression tests,
there don't seem to be any tests for log_lock_waits itself. So before adding
tests for the behavior introduced by the patch, we might first need to add
some basic tests for log_lock_waits.

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-03-17 19:59  Hüseyin Demir <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Hüseyin Demir @ 2026-03-17 19:59 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Chao Li <[email protected]>; [email protected]

Fujii Masao <[email protected]>, 16 Mar 2026 Pzt, 10:22 tarihinde
şunu yazdı:
>
> On Mon, Mar 16, 2026 at 4:05 PM Hüseyin Demir <[email protected]> wrote:
> > The new v2 patch looks good to me.
> >
> > One open question from my side is should we include a test for this
> > behaviour ? Because we mentioned adding a different GUC in the future
> > to manage this rate-limiting approach. It can be useful in the future
> > once we consider/re-visit this approach. If the tests and other future
> > ideas can be developed later together we can consider adding tests
> > later.
>
> I agree it's worth adding such tests. From a quick look at the regression tests,
> there don't seem to be any tests for log_lock_waits itself. So before adding
> tests for the behavior introduced by the patch, we might first need to add
> some basic tests for log_lock_waits.
>
> Regards,
>
> --
> Fujii Masao

I created a regression test for the behaviour we mentioned before
introducing further possible changes.

You can review it. Basically I tried to simulate the desired behaviour
which the current patch introduced.

Regards.


Attachments:

  [application/octet-stream] v1-0001-add-regression-tests-for-still-waiting-on-lock-log-message.patch (5.7K, 2-v1-0001-add-regression-tests-for-still-waiting-on-lock-log-message.patch)
  download | inline diff:
From 4d1ccefbe7aa29ff35699be4a063a691c40e8c94 Mon Sep 17 00:00:00 2001
From: Huseyin Demir <[email protected]>
Date: Tue, 17 Mar 2026 20:28:06 +0100
Subject: [PATCH] Add TAP test for log_lock_waits "still waiting" once-per-wait
 behavior.

Adds src/test/modules/test_misc/t/011_log_lock_waits.pl, which verifies that
the "still waiting on lock" message is emitted exactly once per lock wait even
when the lock-wait sleep is interrupted frequently by a very short
client_connection_check_interval.

Also registers the test in src/test/modules/test_misc/meson.build.
---
 src/test/modules/test_misc/meson.build        |   1 +
 .../modules/test_misc/t/011_log_lock_waits.pl | 110 ++++++++++++++++++
 2 files changed, 111 insertions(+)
 create mode 100755 src/test/modules/test_misc/t/011_log_lock_waits.pl

diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 6e8db1621a7..0c8a9e71ba9 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -19,6 +19,7 @@ tests += {
       't/008_replslot_single_user.pl',
       't/009_log_temp_files.pl',
       't/010_index_concurrently_upsert.pl',
+      't/011_log_lock_waits.pl',
     ],
     # The injection points are cluster-wide, so disable installcheck
     'runningcheck': false,
diff --git a/src/test/modules/test_misc/t/011_log_lock_waits.pl b/src/test/modules/test_misc/t/011_log_lock_waits.pl
new file mode 100755
index 00000000000..bf5df9e4d92
--- /dev/null
+++ b/src/test/modules/test_misc/t/011_log_lock_waits.pl
@@ -0,0 +1,110 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Verify that log_lock_waits emits the "still waiting" message exactly once
+# per lock wait, even when the lock-wait sleep is interrupted by SIGHUP
+# (e.g. via pg_reload_conf()).
+#
+# Background: after deadlock_timeout fires, deadlock_state is set to
+# DS_NO_DEADLOCK, which is != DS_NOT_YET_CHECKED.  Every subsequent wakeup
+# from WaitLatch() re-enters the logging block.  Without the fix, a SIGHUP
+# wakeup (config reload) would re-emit "still waiting" each time it occurred.
+# With the fix a boolean flag ensures the message is logged at most once per
+# ProcSleep() invocation.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf(
+	'postgresql.conf', qq(
+log_lock_waits = on
+deadlock_timeout = '200ms'
+));
+$node->start;
+
+$node->safe_psql('postgres', 'CREATE TABLE lock_test (id int)');
+
+# Session A: hold ACCESS EXCLUSIVE lock
+my $sess_a = $node->background_psql('postgres');
+$sess_a->query_safe(
+	"BEGIN; LOCK TABLE lock_test IN ACCESS EXCLUSIVE MODE;");
+
+# Record log offset before Session B tries to acquire the same lock
+my $log_offset = -s $node->logfile;
+
+# Session B: will block on the same lock.
+# Echo a marker before the blocking LOCK so query_until can return without
+# waiting for lock acquisition.  The LOCK command runs (blocking) in the
+# background.
+my $sess_b = $node->background_psql('postgres');
+$sess_b->query_until(
+	qr/blocking_on_lock/,
+	"SET log_lock_waits = on;\n"
+	  . "SET deadlock_timeout = '200ms';\n"
+	  . "BEGIN;\n"
+	  . "\\echo blocking_on_lock\n"
+	  . "LOCK TABLE lock_test IN ACCESS EXCLUSIVE MODE;\n");
+
+# Wait until Session B is actually waiting for the lock
+ok( $node->poll_query_until(
+		'postgres',
+		"SELECT count(*) = 1 FROM pg_stat_activity "
+		  . "WHERE wait_event_type = 'Lock' AND state = 'active';"),
+	"session B is waiting for the lock");
+
+# Wait for deadlock_timeout to fire so that "still waiting" is emitted once
+# and deadlock_state transitions out of DS_NOT_YET_CHECKED.
+$node->wait_for_log(qr/still waiting for/, $log_offset);
+
+# Send two SIGHUP wakeups via pg_reload_conf().  Each one wakes up Session B's
+# WaitLatch() call.  Without the fix, each wakeup would re-enter the logging
+# block (deadlock_state is now DS_NO_DEADLOCK != DS_NOT_YET_CHECKED) and
+# emit "still waiting" again.  With the fix, logged_lock_waits=true suppresses
+# the repeated message.
+#
+# After each reload, poll until B is back in the lock-wait state so we know
+# the SIGHUP wakeup was fully processed before the next one is sent.
+$node->safe_psql('postgres', 'SELECT pg_reload_conf()');
+$node->poll_query_until(
+	'postgres',
+	"SELECT count(*) = 1 FROM pg_stat_activity "
+	  . "WHERE wait_event_type = 'Lock' AND state = 'active';");
+
+$node->safe_psql('postgres', 'SELECT pg_reload_conf()');
+$node->poll_query_until(
+	'postgres',
+	"SELECT count(*) = 1 FROM pg_stat_activity "
+	  . "WHERE wait_event_type = 'Lock' AND state = 'active';");
+
+# Release the lock so Session B can proceed
+$sess_a->query_safe("COMMIT;");
+
+# Wait for Session B to acquire the lock (signalled by "acquired" in log)
+$node->wait_for_log(qr/acquired AccessExclusiveLock/, $log_offset);
+
+$sess_b->quit;
+$sess_a->quit;
+
+# Count "still waiting" messages in the relevant log segment
+my $log_contents =
+  PostgreSQL::Test::Utils::slurp_file($node->logfile, $log_offset);
+my @still_waiting = ($log_contents =~ /still waiting for/g);
+my @acquired      = ($log_contents =~ /acquired AccessExclusiveLock/g);
+
+# With the fix: exactly 1 "still waiting" (fired after deadlock_timeout),
+# then suppressed for both subsequent pg_reload_conf() wakeups.
+# Without the fix: 3 messages (one per wakeup that re-entered the logging
+# block).
+is(scalar @still_waiting, 1,
+	"still waiting logged exactly once despite pg_reload_conf() wakeups");
+
+# Sanity check: the "acquired" message must appear exactly once
+is(scalar @acquired, 1, "acquired logged once when lock is granted");
+
+$node->stop('fast');
+
+done_testing();
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-04-06 11:09  Fujii Masao <[email protected]>
  parent: Hüseyin Demir <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Fujii Masao @ 2026-04-06 11:09 UTC (permalink / raw)
  To: Hüseyin Demir <[email protected]>; +Cc: Chao Li <[email protected]>; [email protected]

On Wed, Mar 18, 2026 at 4:59 AM Hüseyin Demir <[email protected]> wrote:
> I created a regression test for the behaviour we mentioned before
> introducing further possible changes.
>
> You can review it. Basically I tried to simulate the desired behaviour
> which the current patch introduced.

Thanks for the patch!

Commit 557a9f1e3e6 recently added test for lock stats, which also causes
lock wait situations. So it seems better to extend that test to cover
log_lock_waits rather than adding a new TAP test file.

I've prepared a patch to do this. Patch attached. Thought?

Regards,


Attachments:

  [application/octet-stream] v2-0001-Add-TAP-tests-for-log_lock_waits.patch (5.7K, 2-v2-0001-Add-TAP-tests-for-log_lock_waits.patch)
  download | inline diff:
From 3741cbdfa3a9dbcc883daaeae06891621e844c3c Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Mon, 6 Apr 2026 15:14:34 +0900
Subject: [PATCH v2] Add TAP tests for log_lock_waits

This commit updates 011_lock_stats.pl to verify log_lock_waits behavior.

The tests check that messages are emitted both when a wait occurs and
when the lock is acquired, and that the "still waiting for" message is logged
exactly once per wait, even if the backend wakes up during the wait.

The latter covers the behavior introduced by commit fd6ecbfa75f.
---
 .../modules/test_misc/t/011_lock_stats.pl     | 64 ++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/src/test/modules/test_misc/t/011_lock_stats.pl b/src/test/modules/test_misc/t/011_lock_stats.pl
index 58a0046a52c..9910db6bf4e 100644
--- a/src/test/modules/test_misc/t/011_lock_stats.pl
+++ b/src/test/modules/test_misc/t/011_lock_stats.pl
@@ -1,13 +1,18 @@
 
 # Copyright (c) 2026, PostgreSQL Global Development Group
 
-# Test for the lock statistics
+# Test for the lock statistics and log_lock_waits
 #
 # This test creates multiple locking situations when a session (s2) has to
 # wait on a lock for longer than deadlock_timeout. The first tests each test a
 # dedicated lock type.
 # The last one checks that log_lock_waits has no impact on the statistics
 # counters.
+#
+# This test also checks that log_lock_waits messages are emitted both when
+# a wait occurs and when the lock is acquired, and that the "still waiting for"
+# message is logged exactly once per wait, even if the backend wakes due
+# to signals.
 
 use strict;
 use warnings FATAL => 'all';
@@ -90,6 +95,8 @@ INSERT INTO test_stat_tab(key, value) VALUES('k0', 1);
 
 setup_sessions();
 
+my $log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -112,6 +119,18 @@ LOCK TABLE test_stat_tab;
 
 wait_and_detach($node, 'deadlock-timeout-fired');
 
+# Check that log_lock_waits message is emitted during a lock wait.
+$node->wait_for_log(qr/still waiting for AccessExclusiveLock on relation/,
+	$log_offset);
+
+# Wake the backend waiting on the lock. Confirm it woke by calling
+# pg_log_backend_memory_contexts() and checking for the logged memory contexts.
+$node->safe_psql(
+	'postgres', q[SELECT pg_log_backend_memory_contexts(pid)
+	FROM pg_locks WHERE locktype = 'relation' AND
+	relation = 'test_stat_tab'::regclass AND NOT granted;]);
+$node->wait_for_log(qr/logging memory contexts/, $log_offset);
+
 # deadlock_timeout fired, now commit in s1 and s2
 $s1->query_safe(q(COMMIT));
 $s2->query_safe(q(COMMIT));
@@ -120,6 +139,18 @@ $s2->query_safe(q(COMMIT));
 wait_for_pg_stat_lock($node, 'relation');
 ok(1, "Lock stats ok for relation");
 
+# Check that log_lock_waits message is emitted when the lock is acquired
+# after waiting.
+$node->wait_for_log(qr/acquired AccessExclusiveLock on relation/,
+	$log_offset);
+
+# Check that the "still waiting for" message is logged exactly once per wait,
+# even if the backend wakes during the wait.
+my $log_contents = slurp_file($node->logfile, $log_offset);
+my @still_waiting = ($log_contents =~ /still waiting for/g);
+is(scalar @still_waiting,
+	1, "still waiting logged exactly once despite pg_reload_conf() wakeups");
+
 # close sessions
 $s1->quit;
 $s2->quit;
@@ -128,6 +159,8 @@ $s2->quit;
 
 setup_sessions();
 
+$log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -152,6 +185,10 @@ UPDATE test_stat_tab SET value = value + 1 WHERE key = 'k1';
 
 wait_and_detach($node, 'deadlock-timeout-fired');
 
+# Check that log_lock_waits message is emitted during a lock wait.
+$node->wait_for_log(qr/still waiting for ShareLock on transaction/,
+	$log_offset);
+
 # deadlock_timeout fired, now commit in s1 and s2
 $s1->query_safe(q(COMMIT));
 $s2->query_safe(q(COMMIT));
@@ -160,6 +197,10 @@ $s2->query_safe(q(COMMIT));
 wait_for_pg_stat_lock($node, 'transactionid');
 ok(1, "Lock stats ok for transactionid");
 
+# Check that log_lock_waits message is emitted when the lock is acquired
+# after waiting.
+$node->wait_for_log(qr/acquired ShareLock on transaction/, $log_offset);
+
 # Close sessions
 $s1->quit;
 $s2->quit;
@@ -168,6 +209,8 @@ $s2->quit;
 
 setup_sessions();
 
+$log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -190,6 +233,10 @@ SELECT pg_advisory_lock(1);
 
 wait_and_detach($node, 'deadlock-timeout-fired');
 
+# Check that log_lock_waits message is emitted during a lock wait.
+$node->wait_for_log(qr/still waiting for ExclusiveLock on advisory lock/,
+	$log_offset);
+
 # deadlock_timeout fired, now unlock and commit s2
 $s1->query_safe(q(SELECT pg_advisory_unlock(1)));
 $s2->query_safe(
@@ -202,6 +249,10 @@ COMMIT;
 wait_for_pg_stat_lock($node, 'advisory');
 ok(1, "Lock stats ok for advisory");
 
+# Check that log_lock_waits message is emitted when the lock is acquired
+# after waiting.
+$node->wait_for_log(qr/acquired ExclusiveLock on advisory lock/, $log_offset);
+
 # Close sessions
 $s1->quit;
 $s2->quit;
@@ -210,6 +261,8 @@ $s2->quit;
 
 setup_sessions();
 
+$log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -241,6 +294,15 @@ $s2->query_safe(q(COMMIT));
 wait_for_pg_stat_lock($node, 'relation');
 ok(1, "log_lock_waits has no impact on Lock stats");
 
+# Check that no log_lock_waits messages are emitted
+ok( !$node->log_contains(
+		"still waiting for AccessExclusiveLock on relation", $log_offset),
+	'check that no log_lock_waits message is emitted during a lock wait');
+ok( !$node->log_contains(
+		"acquired AccessExclusiveLock on relation", $log_offset),
+	'check that log_lock_waits message is emitted when the lock is acquired after waiting'
+);
+
 # close sessions
 $s1->quit;
 $s2->quit;
-- 
2.51.2



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-04-06 12:59  Hüseyin Demir <[email protected]>
  parent: Fujii Masao <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Hüseyin Demir @ 2026-04-06 12:59 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: Chao Li <[email protected]>; [email protected]

Hi,



> Thanks for the patch!
>
> Commit 557a9f1e3e6 recently added test for lock stats, which also causes
> lock wait situations. So it seems better to extend that test to cover
> log_lock_waits rather than adding a new TAP test file.
>
> I've prepared a patch to do this. Patch attached. Thought?
>
> Regards,
>

Appreciated for the patch. I reviewed it quickly.

In the test description it says that `still waiting logged exactly once
despite pg_reload_conf() wakeups")` but the test sends via
pg_log_backend_memory_contexts(). It would make sense to update it.

```
  -  1, "still waiting logged exactly once despite pg_reload_conf()
wakeups");
  +  1, "still waiting logged exactly once despite wakeups from
pg_log_backend_memory_contexts()");
```

Secondly, before finishing the test it tries to check that no
log_lock_waits messages are emitted. But the comment has the opposite
meaning.

```
  - 'check that log_lock_waits message is emitted when the lock is acquired
after waiting'
  + 'check that no log_lock_waits message is emitted when the lock is
acquired after waiting'
```

I'm not sure they need to change but these are the only topics I wanted to
add. Otherwise, lgtm and thanks.

I attached the v3 to convey my ideas. You can use it or update the existing
if you think the suggestions are reasonable.


Regards.


Attachments:

  [application/octet-stream] v3-0001-Add-TAP-tests-for-log_lock_waits.patch (5.1K, 3-v3-0001-Add-TAP-tests-for-log_lock_waits.patch)
  download | inline diff:
diff --git a/src/test/modules/test_misc/t/011_lock_stats.pl b/src/test/modules/test_misc/t/011_lock_stats.pl
index 58a0046a52c..d95fcd4bb4c 100644
--- a/src/test/modules/test_misc/t/011_lock_stats.pl
+++ b/src/test/modules/test_misc/t/011_lock_stats.pl
@@ -1,13 +1,18 @@
 
 # Copyright (c) 2026, PostgreSQL Global Development Group
 
-# Test for the lock statistics
+# Test for the lock statistics and log_lock_waits
 #
 # This test creates multiple locking situations when a session (s2) has to
 # wait on a lock for longer than deadlock_timeout. The first tests each test a
 # dedicated lock type.
 # The last one checks that log_lock_waits has no impact on the statistics
 # counters.
+#
+# This test also checks that log_lock_waits messages are emitted both when
+# a wait occurs and when the lock is acquired, and that the "still waiting for"
+# message is logged exactly once per wait, even if the backend wakes due
+# to signals.
 
 use strict;
 use warnings FATAL => 'all';
@@ -90,6 +95,8 @@ INSERT INTO test_stat_tab(key, value) VALUES('k0', 1);
 
 setup_sessions();
 
+my $log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -112,6 +119,18 @@ LOCK TABLE test_stat_tab;
 
 wait_and_detach($node, 'deadlock-timeout-fired');
 
+# Check that log_lock_waits message is emitted during a lock wait.
+$node->wait_for_log(qr/still waiting for AccessExclusiveLock on relation/,
+	$log_offset);
+
+# Wake the backend waiting on the lock. Confirm it woke by calling
+# pg_log_backend_memory_contexts() and checking for the logged memory contexts.
+$node->safe_psql(
+	'postgres', q[SELECT pg_log_backend_memory_contexts(pid)
+	FROM pg_locks WHERE locktype = 'relation' AND
+	relation = 'test_stat_tab'::regclass AND NOT granted;]);
+$node->wait_for_log(qr/logging memory contexts/, $log_offset);
+
 # deadlock_timeout fired, now commit in s1 and s2
 $s1->query_safe(q(COMMIT));
 $s2->query_safe(q(COMMIT));
@@ -120,6 +139,18 @@ $s2->query_safe(q(COMMIT));
 wait_for_pg_stat_lock($node, 'relation');
 ok(1, "Lock stats ok for relation");
 
+# Check that log_lock_waits message is emitted when the lock is acquired
+# after waiting.
+$node->wait_for_log(qr/acquired AccessExclusiveLock on relation/,
+	$log_offset);
+
+# Check that the "still waiting for" message is logged exactly once per wait,
+# even if the backend wakes during the wait.
+my $log_contents = slurp_file($node->logfile, $log_offset);
+my @still_waiting = ($log_contents =~ /still waiting for/g);
+is(scalar @still_waiting,
+	1, "still waiting logged exactly once despite wakeups from pg_log_backend_memory_contexts()");
+
 # close sessions
 $s1->quit;
 $s2->quit;
@@ -128,6 +159,8 @@ $s2->quit;
 
 setup_sessions();
 
+$log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -152,6 +185,10 @@ UPDATE test_stat_tab SET value = value + 1 WHERE key = 'k1';
 
 wait_and_detach($node, 'deadlock-timeout-fired');
 
+# Check that log_lock_waits message is emitted during a lock wait.
+$node->wait_for_log(qr/still waiting for ShareLock on transaction/,
+	$log_offset);
+
 # deadlock_timeout fired, now commit in s1 and s2
 $s1->query_safe(q(COMMIT));
 $s2->query_safe(q(COMMIT));
@@ -160,6 +197,10 @@ $s2->query_safe(q(COMMIT));
 wait_for_pg_stat_lock($node, 'transactionid');
 ok(1, "Lock stats ok for transactionid");
 
+# Check that log_lock_waits message is emitted when the lock is acquired
+# after waiting.
+$node->wait_for_log(qr/acquired ShareLock on transaction/, $log_offset);
+
 # Close sessions
 $s1->quit;
 $s2->quit;
@@ -168,6 +209,8 @@ $s2->quit;
 
 setup_sessions();
 
+$log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -190,6 +233,10 @@ SELECT pg_advisory_lock(1);
 
 wait_and_detach($node, 'deadlock-timeout-fired');
 
+# Check that log_lock_waits message is emitted during a lock wait.
+$node->wait_for_log(qr/still waiting for ExclusiveLock on advisory lock/,
+	$log_offset);
+
 # deadlock_timeout fired, now unlock and commit s2
 $s1->query_safe(q(SELECT pg_advisory_unlock(1)));
 $s2->query_safe(
@@ -202,6 +249,10 @@ COMMIT;
 wait_for_pg_stat_lock($node, 'advisory');
 ok(1, "Lock stats ok for advisory");
 
+# Check that log_lock_waits message is emitted when the lock is acquired
+# after waiting.
+$node->wait_for_log(qr/acquired ExclusiveLock on advisory lock/, $log_offset);
+
 # Close sessions
 $s1->quit;
 $s2->quit;
@@ -210,6 +261,8 @@ $s2->quit;
 
 setup_sessions();
 
+$log_offset = -s $node->logfile;
+
 $s1->query_safe(
 	q[
 SELECT pg_stat_reset_shared('lock');
@@ -241,6 +294,15 @@ $s2->query_safe(q(COMMIT));
 wait_for_pg_stat_lock($node, 'relation');
 ok(1, "log_lock_waits has no impact on Lock stats");
 
+# Check that no log_lock_waits messages are emitted
+ok( !$node->log_contains(
+		"still waiting for AccessExclusiveLock on relation", $log_offset),
+	'check that no log_lock_waits message is emitted during a lock wait');
+ok( !$node->log_contains(
+		"acquired AccessExclusiveLock on relation", $log_offset),
+	'check that no log_lock_waits message is emitted when the lock is acquired after waiting'
+);
+
 # close sessions
 $s1->quit;
 $s2->quit;


^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: client_connection_check_interval default value
@ 2026-04-06 14:55  Fujii Masao <[email protected]>
  parent: Hüseyin Demir <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Fujii Masao @ 2026-04-06 14:55 UTC (permalink / raw)
  To: Hüseyin Demir <[email protected]>; +Cc: Chao Li <[email protected]>; [email protected]

On Mon, Apr 6, 2026 at 10:00 PM Hüseyin Demir <[email protected]> wrote:
> Appreciated for the patch. I reviewed it quickly.
>
> In the test description it says that `still waiting logged exactly once despite pg_reload_conf() wakeups")` but the test sends via pg_log_backend_memory_contexts(). It would make sense to update it.
>
> ```
>   -  1, "still waiting logged exactly once despite pg_reload_conf() wakeups");
>   +  1, "still waiting logged exactly once despite wakeups from pg_log_backend_memory_contexts()");
> ```
>
> Secondly, before finishing the test it tries to check that no log_lock_waits messages are emitted. But the comment has the opposite meaning.
>
> ```
>   - 'check that log_lock_waits message is emitted when the lock is acquired after waiting'
>   + 'check that no log_lock_waits message is emitted when the lock is acquired after waiting'
> ```
>
> I'm not sure they need to change but these are the only topics I wanted to add. Otherwise, lgtm and thanks.
>
> I attached the v3 to convey my ideas. You can use it or update the existing if you think the suggestions are reasonable.

Thanks for the review and for updating the patch!
Your changes look good to me.

I also added a comment explaining why the test wakes the backend,
and then pushed the patch. Thanks again!

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

end of thread, other threads:[~2026-04-06 14:55 UTC | newest]

Thread overview: 22+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-05 23:42 Re: client_connection_check_interval default value Fujii Masao <[email protected]>
2026-02-06 00:01 ` Tom Lane <[email protected]>
2026-02-06 00:18   ` Jacob Champion <[email protected]>
2026-02-06 01:36     ` Thomas Munro <[email protected]>
2026-02-06 01:02   ` Jeremy Schneider <[email protected]>
2026-02-06 09:40   ` Laurenz Albe <[email protected]>
2026-02-18 05:30   ` Fujii Masao <[email protected]>
2026-02-18 08:03     ` Laurenz Albe <[email protected]>
2026-02-18 09:00       ` Ants Aasma <[email protected]>
2026-02-26 01:30         ` Fujii Masao <[email protected]>
2026-03-09 14:12           ` Fujii Masao <[email protected]>
2026-03-09 18:18             ` Hüseyin Demir <[email protected]>
2026-03-13 12:36             ` Fujii Masao <[email protected]>
2026-03-16 01:36               ` Chao Li <[email protected]>
2026-03-16 09:21                 ` Fujii Masao <[email protected]>
2026-03-16 07:04               ` Hüseyin Demir <[email protected]>
2026-03-16 09:22                 ` Fujii Masao <[email protected]>
2026-03-17 19:59                   ` Hüseyin Demir <[email protected]>
2026-04-06 11:09                     ` Fujii Masao <[email protected]>
2026-04-06 12:59                       ` Hüseyin Demir <[email protected]>
2026-04-06 14:55                         ` Fujii Masao <[email protected]>
2026-02-24 06:01     ` Chao Li <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox