Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vo902-00Fttc-0O for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Feb 2026 23:43:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vo901-001JRm-0K for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Feb 2026 23:43:00 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vo900-001JRe-2T for pgsql-hackers@lists.postgresql.org; Thu, 05 Feb 2026 23:43:00 +0000 Received: from mail-oo1-xc34.google.com ([2607:f8b0:4864:20::c34]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vo8zy-00000001GkO-0jeB for pgsql-hackers@lists.postgresql.org; Thu, 05 Feb 2026 23:43:00 +0000 Received: by mail-oo1-xc34.google.com with SMTP id 006d021491bc7-669287780f5so932197eaf.1 for ; Thu, 05 Feb 2026 15:42:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770334976; cv=none; d=google.com; s=arc-20240605; b=E8padtvD2INxHpjAjSxDdHDAPT2q+IdpyNVKbSLP6TWqgVSqOTTJkjNncNsBL//qq5 8uPdjHe4zXCq59jUbM1x3Qg/MGP32vbO1ASZFlUybMbxSJEKvLZ+y8Q6WzFAqI7imT6x CSsnf2P9SURhq7k/SQTGPdMYsICyZqpXngx8S1uGk0KYGsmlmKu7CdPYTVYRjR5pMsWN Y4UrJpSxXhgw46g5NAj+ZVGLhYoDRh3Qek8UiGaQ+15JY84e77Xuuj+cg7ryZFNwjLI+ XO4gP6Jka9gp1NoAdAqKCG48PFrwpXNxI3sNKd7Dgknnkx0DT552UfnHKtUCBvBT/zYN cN+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ikPVPIO09KWwGw5WWcjY3+nZbXdWB4hTf4RxhJe7lto=; fh=dwJA+W95rjyixC1aZFHmnN1orM3fI50OXm5eaHccgWg=; b=jngcyu21GZF8vi9sHc95j6NLtrYiF33PXgSXgdX+iEBTNQLk9RqtEJZbSFp7Y3aMQV p9c39OaY63fuY4EEos03dJPqrZoqVEcIPppOpuD9Y0McFlJYlFSV438hqwlenJgOUfmz 2+CNrAVJEOCminuKKLX63/xOIee/fnj7AGdT4zD4vdbq10I0fFTqbDFnYLTioCJd/DvK +R0TKXr6yT++jwwIlPlWAb8EyrIW+Gr06EGwghqtU7G+JqtSJHxLEJGtEwj/CCyUYrir ZXp08lr4VZF7wM4UbfUQquamBopdWYWvACdTkZ1WqJAX5f9AGa1TYq1CG1IAIhK88o7o ZsAg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770334976; x=1770939776; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ikPVPIO09KWwGw5WWcjY3+nZbXdWB4hTf4RxhJe7lto=; b=KOVaAqIuaUDx2bnutInc9xHFkiOpOmJdhXTrTS1gTdJgrN4C9sbMO2+5lHgQwWaGI0 i9xXnD2mGTYaQxhTOU5hXnJ1rnmFm/vccB8+qiR5XoFr/R27k33TfXdi3MB9cN4+2Ixn zMRXZplJnHX+2GJ6KtnGwWFxP7rBICcFpidhstOPQD0h3BMiCJLrhxi2wwrt+Z60oWAP G4uP++dI/dG7XxBDjXQv9ASLG4dCHPJ5ui3liyV1Mn1bFsBpAZBmo5jmx6jydF6M4Cv6 HnUkNJY088zzugUaOK6g/DodcHmt9K2uXltiu/wGYscA4shGJ7cVxYfyY0lZlgR/nVie 6d+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770334976; x=1770939776; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ikPVPIO09KWwGw5WWcjY3+nZbXdWB4hTf4RxhJe7lto=; b=HnwhrkAUnFIXuH1PCdNdfXHcOcVhcAko49iXqmK4z/++6LryR58yF99H/n/1rvWz5O IADf5xPRlWol86LXgSZPGwdIuHAMIn3/hPwdOkC1fCouciE6tvz/X4sD19dN+5MnS/cn I1zBFOa1upA6jlOsZk7rkyb686erUOImofeYT2ulQbhc4SpZWutLbljoq6dc+oEDkGB4 CQ7nuud1teIDi9UK4Sc2DHMvFcX/Y9PU2EcwWe3zdV1IKMCOoHx3YgAxTe1nZ1SgYqaQ NXsIcwkSIFE/zdutRkkr/uPysJgAa9eqbSt/KAfprexvCtSF4yKRPdC1byVLfSPzkfF1 OT1A== X-Forwarded-Encrypted: i=1; AJvYcCUbzCR14pBoHvR+tYTy+JKxM9E9XUFJQApc4ROGQG1pbuKvObTqS4Vylai1sNt43zLiyp+/LUxFwnwgsed6@lists.postgresql.org X-Gm-Message-State: AOJu0Yyy5skkY9PiDi1qOIfSX4Bi0r94odwnFsCKyoiBOf423W1QueWT yEs+6l01qSEamkvT+MjxvfRat24nbHtDVKw9aYDyDKn5MTOtF+HM7pRVSs9fGEtLO6veDd7QJqc jkdyh+D1zGR2bvt/H3euW8VlbJi/3Ckk= X-Gm-Gg: AZuq6aKE6fB2QVPS4NbcKIs4f38DaZjmNxI7JhAynglNzuY0sR8rGKnzZ28SskzH5qP tD9GH6j/hWShtK+iVTv6zVFLwbRxm9bU35AYKJn6lNTBDs1Y0G1Rbs3Gcglhxzk96grrx5pGb8i n9Gzoe3jZ1d+cH+UzenPtuWod+8OvgNaHJ/3+wduC63fF6sixeJF6cGdfNCiqhLrysmKG4kuwKv bS3oTASsjVMWz0ue+/53MC9OIJl2OApVE0DApPR0dibIyOPZb0OqgA+Iod1pPwP/uF23YSrjZJ+ ZCJLQXvpz+YRtQh4CkeKgHN01hdojA== X-Received: by 2002:a05:6820:81f:b0:663:1840:2e85 with SMTP id 006d021491bc7-66ba4fc1cabmr2375768eaf.9.1770334976113; Thu, 05 Feb 2026 15:42:56 -0800 (PST) MIME-Version: 1.0 References: <20260204213032.15bab46b@ardentperf.com> <20260205150452.00006167@ardentperf.com> In-Reply-To: <20260205150452.00006167@ardentperf.com> From: Fujii Masao Date: Fri, 6 Feb 2026 08:42:42 +0900 X-Gm-Features: AZwV_QipKvYIEGM9gnprydilFqP3_PO8i_plpp_zQb5nkbkYYwEqRRUhJXBBVoU Message-ID: Subject: Re: client_connection_check_interval default value To: Jeremy Schneider Cc: Jacob Champion , "pgsql-hackers@lists.postgresql.org" , Marat Buharov , Greg Sabino Mullane , Thomas Munro Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Feb 6, 2026 at 8:05=E2=80=AFAM Jeremy Schneider wrote: > > One interesting thing to me - it seems like all of the past mail > threads were focused on a situation different from mine. Lots of > discussion about freeing resources like CPU. > > In the outage I saw, the system was idle and we completely ran out of > max_connections because all sessions were waiting on a row lock. > > Importantly, the app was closing these conns but we had sockets stacking > up on the server in CLOSE-WAIT state - and postgres simply never > cleaned them up until we had an outage. The processes were completely > idle waiting for a row lock that was not going to be released. > > Impact could have been isolated to sessions hitting that row (with this > GUC), but it escalated to a system outage. It's pretty simple to > reproduce this: > https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion > > > On Thu, 5 Feb 2026 09:26:34 -0800 > Jacob Champion wrote: > > > On Wed, Feb 4, 2026 at 9:30=E2=80=AFPM Jeremy Schneider > > wrote: > > > While a fix has been merged in pgx for the most direct root cause of > > > the incident I saw, this setting just seems like a good behavior to > > > make Postgres more robust in general. > > > > At the risk of making perfect the enemy of better, the protocol-level > > heartbeat mentioned in the original thread [1] would cover more use > > cases, which might give it a better chance of eventually becoming > > default behavior. It might also be a lot of work, though. > > It seems like a fair bit of discussion is around OS coverage - even > Thomas' message there references keepalive working as expected on > Linux. Tom objects in 2023 that "the default behavior would then be > platform-dependent and that's a documentation problem we could do > without." > > But it's been five years - has there been further work on implementing > a postgres-level heartbeat? And I see other places in the docs where we > note platform differences, is it really such a big problem to change > the default here? > > > On Thu, 5 Feb 2026 10:00:29 -0500 > Greg Sabino Mullane wrote: > > > I'm a weak -1 on this. Certainly not 2s! That's a lot of context > > switching for a busy system for no real reason. Also see this past > > discussion: > > In the other thread I see larger perf concerns with some early > implementations before they refactored the patch? Konstantin's message > on 2019-08-02 said he didn't see much difference, and the value of the > timeout didn't seem to matter, and if anything the marginal effect was > simply from the presence of any timer (same effect as setting > statement_timeout) - and later on the thread it seems like Thomas also > saw minimal performance concern here. > > I did see a real system outage that could have been prevented by an > appropriate default value here, since I didn't yet know to change it. I'm not sure that client_connection_check_interval needs to be enabled by default. However, if we do agree to change the default and apply it, I think we should first address the related issue: with log_lock_waits enab= led by default, setting client_connection_check_interval to 2s would cause "still waiting" messages to be logged every 2 seconds during waiting on the lock. That could result in a lot of noisy logging under default setting= s. The issue is that backends blocked in ProcSleep() are woken up every client_connection_check_interval and may emit a "still waiting" message each time if log_lock_waits is enabled. To mitigate this, just one idea is to add a flag to track whether the "still waiting" message has already been emitted during a call to ProcSleep(), and suppress further messages once it has been logged. Regards, --=20 Fujii Masao