public inbox for [email protected]help / color / mirror / Atom feed
client_connection_check_interval default value 5+ messages / 3 participants [nested] [flat]
* client_connection_check_interval default value @ 2026-02-05 05:30 Jeremy Schneider <[email protected]> 2026-02-05 13:36 ` Re: client_connection_check_interval default value Jeremy Schneider <[email protected]> 2026-02-05 15:00 ` Re: client_connection_check_interval default value Greg Sabino Mullane <[email protected]> 2026-02-05 17:26 ` Re: client_connection_check_interval default value Jacob Champion <[email protected]> 0 siblings, 3 replies; 5+ messages in thread From: Jeremy Schneider @ 2026-02-05 05:30 UTC (permalink / raw) To: [email protected] <[email protected]>; +Cc: Marat Buharov <[email protected]> What would people here think about changing the default value of client_connection_check_interval to 2000 ms? Right now this is disabled by default. The background is that I recently saw an incident where a blocking-lock brownout escalated from a row-level problem to a complete system outage, due to a combination of factors including a bug in golang's pgx postgres client (PR 2481 has now been merged w a fix) and a pgbouncer setup that was missing peers configuration. As a result, cancel messages were getting dropped while postgres connections were waiting on a blocked lock, golang aggresively timed out on context deadlines and retried, and once the database reached max_connections the whole system ground to a halt. At the time I thought it was weird that postgres wasn't checking for dead connections while those conns were waiting for locks; I spent a bunch of time investigating this and reproduced it and wrote up what I was able to figure out. Then, yesterday, I saw a LinkedIn post from Marat at Data Egret who mentioned that client_connection_check_interval exists. Plugged this into my repro and confirmed it can prevent postgres from escalating the blocking-lock brownout into a complete outage due to connection exhaustion. While a fix has been merged in pgx for the most direct root cause of the incident I saw, this setting just seems like a good behavior to make Postgres more robust in general. 2000 ms seemed like a fairly safe/conservative starting point for discussion. Thoughts? -Jeremy PS. Some more details and graphs are at https://ardentperf.com/2026/02/04/postgres-client_connection_check_interval/ -- To know the thoughts and deeds that have marked man's progress is to feel the great heart throbs of humanity through the centuries; and if one does not feel in these pulsations a heavenward striving, one must indeed be deaf to the harmonies of life. Helen Keller, The Story Of My Life, 1902, 1903, 1905, introduction by Ralph Barton Perry (Garden City, NY: Doubleday & Company, 1954), p90. Attachments: [text/x-patch] 0001-Change-default-client_connection_check_interval-to-2.patch (2.0K, 2-0001-Change-default-client_connection_check_interval-to-2.patch) download | inline diff: From 7936fc4ad60e40b38629f372bff397bc42a0a7f5 Mon Sep 17 00:00:00 2001 From: Jeremy Schneider <[email protected]> Date: Wed, 4 Feb 2026 21:08:32 -0800 Subject: [PATCH] Change default client_connection_check_interval to 2000ms The default value of client_connection_check_interval is changed from 0 (disabled) to 2000ms (2 seconds). This enables periodic checking for client disconnection during long-running queries by default, which can help detect and clean up queries from disconnected clients more promptly. A value of 0 continues to disable connection checking for users who prefer the previous behavior. --- src/backend/utils/misc/guc_parameters.dat | 2 +- src/backend/utils/misc/postgresql.conf.sample | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat index f0260e6e412..91c0d740ce5 100644 --- a/src/backend/utils/misc/guc_parameters.dat +++ b/src/backend/utils/misc/guc_parameters.dat @@ -403,7 +403,7 @@ long_desc => '0 disables connection checks.', flags => 'GUC_UNIT_MS', variable => 'client_connection_check_interval', - boot_val => '0', + boot_val => '2000', min => '0', max => 'INT_MAX', check_hook => 'check_client_connection_check_interval', diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index c4f92fcdac8..8dd89d6da4e 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -87,7 +87,7 @@ #tcp_user_timeout = 0 # TCP_USER_TIMEOUT, in milliseconds; # 0 selects the system default -#client_connection_check_interval = 0 # time between checks for client +#client_connection_check_interval = 2000 # time between checks for client # disconnection while running queries; # 0 for never -- 2.43.0 ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: client_connection_check_interval default value 2026-02-05 05:30 client_connection_check_interval default value Jeremy Schneider <[email protected]> @ 2026-02-05 13:36 ` Jeremy Schneider <[email protected]> 2 siblings, 0 replies; 5+ messages in thread From: Jeremy Schneider @ 2026-02-05 13:36 UTC (permalink / raw) To: [email protected] <[email protected]>; +Cc: Marat Buharov <[email protected]> On Wed, 4 Feb 2026 21:30:32 -0800 Jeremy Schneider <[email protected]> wrote: > What would people here think about changing the default value of > client_connection_check_interval to 2000 ms? Right now this is > disabled by default. Forgot the doc update in the attached patch. Updated -Jeremy Attachments: [text/x-patch] 0001-Change-default-client_connection_check_interval-to-2.patch (3.1K, 2-0001-Change-default-client_connection_check_interval-to-2.patch) download | inline diff: From b23d89338daaec9c381783fe6a9df9d148aeace7 Mon Sep 17 00:00:00 2001 From: Jeremy Schneider <[email protected]> Date: Wed, 4 Feb 2026 21:08:32 -0800 Subject: [PATCH] Change default client_connection_check_interval to 2000ms The default value of client_connection_check_interval is changed from 0 (disabled) to 2000ms (2 seconds). This enables periodic checking for client disconnection during long-running queries by default, which can help detect and clean up queries from disconnected clients more promptly. A value of 0 continues to disable connection checking for users who prefer the previous behavior. --- doc/src/sgml/config.sgml | 9 +++++---- src/backend/utils/misc/guc_parameters.dat | 2 +- src/backend/utils/misc/postgresql.conf.sample | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5560b95ee60..5bc7f029e80 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1059,10 +1059,11 @@ include_dir 'conf.d' </para> <para> If the value is specified without units, it is taken as milliseconds. - The default value is <literal>0</literal>, which disables connection - checks. Without connection checks, the server will detect the loss of - the connection only at the next interaction with the socket, when it - waits for, receives or sends data. + The default value is <literal>2000</literal> (2 seconds). A value of + <literal>0</literal> disables connection checks. Without connection + checks, the server will detect the loss of the connection only at the + next interaction with the socket, when it waits for, receives or sends + data. </para> <para> For the kernel itself to detect lost TCP connections reliably and within diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat index f0260e6e412..91c0d740ce5 100644 --- a/src/backend/utils/misc/guc_parameters.dat +++ b/src/backend/utils/misc/guc_parameters.dat @@ -403,7 +403,7 @@ long_desc => '0 disables connection checks.', flags => 'GUC_UNIT_MS', variable => 'client_connection_check_interval', - boot_val => '0', + boot_val => '2000', min => '0', max => 'INT_MAX', check_hook => 'check_client_connection_check_interval', diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index c4f92fcdac8..8dd89d6da4e 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -87,7 +87,7 @@ #tcp_user_timeout = 0 # TCP_USER_TIMEOUT, in milliseconds; # 0 selects the system default -#client_connection_check_interval = 0 # time between checks for client +#client_connection_check_interval = 2000 # time between checks for client # disconnection while running queries; # 0 for never -- 2.43.0 ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: client_connection_check_interval default value 2026-02-05 05:30 client_connection_check_interval default value Jeremy Schneider <[email protected]> @ 2026-02-05 15:00 ` Greg Sabino Mullane <[email protected]> 2 siblings, 0 replies; 5+ messages in thread From: Greg Sabino Mullane @ 2026-02-05 15:00 UTC (permalink / raw) To: Jeremy Schneider <[email protected]>; +Cc: [email protected] <[email protected]>; Marat Buharov <[email protected]> I'm a weak -1 on this. Certainly not 2s! That's a lot of context switching for a busy system for no real reason. Also see this past discussion: https://www.postgresql.org/message-id/CTEB8LNLOHKR.3I6NK8QVBAGSQ@gonk -- Cheers, Greg ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: client_connection_check_interval default value 2026-02-05 05:30 client_connection_check_interval default value Jeremy Schneider <[email protected]> @ 2026-02-05 17:26 ` Jacob Champion <[email protected]> 2026-02-05 23:04 ` Re: client_connection_check_interval default value Jeremy Schneider <[email protected]> 2 siblings, 1 reply; 5+ messages in thread From: Jacob Champion @ 2026-02-05 17:26 UTC (permalink / raw) To: Jeremy Schneider <[email protected]>; +Cc: [email protected] <[email protected]>; Marat Buharov <[email protected]> On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider <[email protected]> wrote: > While a fix has been merged in pgx for the most direct root cause of > the incident I saw, this setting just seems like a good behavior to > make Postgres more robust in general. At the risk of making perfect the enemy of better, the protocol-level heartbeat mentioned in the original thread [1] would cover more use cases, which might give it a better chance of eventually becoming default behavior. It might also be a lot of work, though. --Jacob [1] https://postgr.es/m/CA%2BhUKGLyj5Aqt6ojYfSc%2BqSeB1x%3D3RbU61hnus5sL0BKqEBsLw%40mail.gmail.com ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: client_connection_check_interval default value 2026-02-05 05:30 client_connection_check_interval default value Jeremy Schneider <[email protected]> 2026-02-05 17:26 ` Re: client_connection_check_interval default value Jacob Champion <[email protected]> @ 2026-02-05 23:04 ` Jeremy Schneider <[email protected]> 0 siblings, 0 replies; 5+ messages in thread From: Jeremy Schneider @ 2026-02-05 23:04 UTC (permalink / raw) To: Jacob Champion <[email protected]>; +Cc: [email protected] <[email protected]>; Marat Buharov <[email protected]>; Greg Sabino Mullane <[email protected]>; Thomas Munro <[email protected]> One interesting thing to me - it seems like all of the past mail threads were focused on a situation different from mine. Lots of discussion about freeing resources like CPU. In the outage I saw, the system was idle and we completely ran out of max_connections because all sessions were waiting on a row lock. Importantly, the app was closing these conns but we had sockets stacking up on the server in CLOSE-WAIT state - and postgres simply never cleaned them up until we had an outage. The processes were completely idle waiting for a row lock that was not going to be released. Impact could have been isolated to sessions hitting that row (with this GUC), but it escalated to a system outage. It's pretty simple to reproduce this: https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion On Thu, 5 Feb 2026 09:26:34 -0800 Jacob Champion <[email protected]> wrote: > On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider > <[email protected]> wrote: > > While a fix has been merged in pgx for the most direct root cause of > > the incident I saw, this setting just seems like a good behavior to > > make Postgres more robust in general. > > At the risk of making perfect the enemy of better, the protocol-level > heartbeat mentioned in the original thread [1] would cover more use > cases, which might give it a better chance of eventually becoming > default behavior. It might also be a lot of work, though. It seems like a fair bit of discussion is around OS coverage - even Thomas' message there references keepalive working as expected on Linux. Tom objects in 2023 that "the default behavior would then be platform-dependent and that's a documentation problem we could do without." But it's been five years - has there been further work on implementing a postgres-level heartbeat? And I see other places in the docs where we note platform differences, is it really such a big problem to change the default here? On Thu, 5 Feb 2026 10:00:29 -0500 Greg Sabino Mullane <[email protected]> wrote: > I'm a weak -1 on this. Certainly not 2s! That's a lot of context > switching for a busy system for no real reason. Also see this past > discussion: In the other thread I see larger perf concerns with some early implementations before they refactored the patch? Konstantin's message on 2019-08-02 said he didn't see much difference, and the value of the timeout didn't seem to matter, and if anything the marginal effect was simply from the presence of any timer (same effect as setting statement_timeout) - and later on the thread it seems like Thomas also saw minimal performance concern here. I did see a real system outage that could have been prevented by an appropriate default value here, since I didn't yet know to change it. -Jeremy Attachments: [image/png] client_connection_check.png (45.7K, 2-client_connection_check.png) download | view image ^ permalink raw reply [nested|flat] 5+ messages in thread
end of thread, other threads:[~2026-02-05 23:04 UTC | newest] Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-02-05 05:30 client_connection_check_interval default value Jeremy Schneider <[email protected]> 2026-02-05 13:36 ` Jeremy Schneider <[email protected]> 2026-02-05 15:00 ` Greg Sabino Mullane <[email protected]> 2026-02-05 17:26 ` Jacob Champion <[email protected]> 2026-02-05 23:04 ` Jeremy Schneider <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox