Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1svVt2-00Fd3D-0d for pgsql-general@arkaria.postgresql.org; Tue, 01 Oct 2024 05:57:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1svVt0-00B9FJ-KU for pgsql-general@arkaria.postgresql.org; Tue, 01 Oct 2024 05:57:26 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1svVt0-00B9FB-8L for pgsql-general@lists.postgresql.org; Tue, 01 Oct 2024 05:57:26 +0000 Received: from mail-vk1-xa34.google.com ([2607:f8b0:4864:20::a34]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1svVsr-001xk1-8s for pgsql-general@lists.postgresql.org; Tue, 01 Oct 2024 05:57:24 +0000 Received: by mail-vk1-xa34.google.com with SMTP id 71dfb90a1353d-50abf1bd152so551470e0c.2 for ; Mon, 30 Sep 2024 22:57:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727762237; x=1728367037; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/dAZEOSDnXXzUD2YhLsfwURsT98Ykw55pqxrG4YVrVg=; b=gPAWS1HYva0qKFycS6S6Hf7ZZ4x6LGZMmCqeZBF9Yckx2q/ypjWWnPJtF68KCLDW+l Zh1SFKZwv5BukbOpM504To2CjDYyrFOh4vdK9qMYM2tGq+iuvpSAQDTbhOUXcRVFZg1u 0RMdX9tPhxap4C4lkiAfYQBAsJBmEh6DLc1jDLIgUzXIoIA5jzj7NPqLl9jXg1S4Q5le 4CtBowSZtKT7bA6CUR3pYGt03sIieTnWAMVCDLGWUr5Sme8C31jr8g+2jRj0EZl/62RT njo9TijLLXh++2Cl+bkwISw549Q7Gdwa4qZ0qNkwlh1roNR5zL4G6p36zzK1EeqjLTT6 WkBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727762237; x=1728367037; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/dAZEOSDnXXzUD2YhLsfwURsT98Ykw55pqxrG4YVrVg=; b=XeY4puDZSb++CXHozUQnreJ/V8RZJKQPEfo3I1/jV2p9iewO17J+DwRFSA0TOL4WQj EgmnThVYl5SSvctdY7E03aZI6Kf9i+TUnSOQjy0GHYrCxGCQ3uvzeY50uztgQRlfNJ2m FM6doH0HWgJmYvgFeHcFiVrj+q7UTXkjbJur8OIMVydexzP1D3G7+GUiAD6inrQUJuHM 8A+cPgK7jc4Qaecgh2qG3vnzTAiFOPpcSjaxYeDLZDWWitiIrz0LcaL629UQJCw1M0DA FK7hYirTQb2XgK0g9pxh8S0c7XjXshEowpLU/pXieGFpTVMqe2e1mTYA3aHX38BLVBvt A3LA== X-Gm-Message-State: AOJu0YzIfgDAv6wXUvYFX798lF/J4u2CkI09Fj7mDvZiCkkNhdzSl6gO H5Iv1eXh7By+Wgpo+PKxLZX/wN9z+6JoXSTPlyg2JGy86H0WfOtVtbiYzbdf7vuHvMA85t5JWah Bzai2ssmiYN+uD6nnozIJACqLWfhT7fLh X-Google-Smtp-Source: AGHT+IFCW8PT6o/O2/LtrHomaMk9wKa44qag+IQp3m3jLuSGS6WZojgZcqGNPEcHfMyMfx8jM9iGhMxlU/9/4yljXYc= X-Received: by 2002:a05:6122:8c4:b0:4f5:199b:2a61 with SMTP id 71dfb90a1353d-507818a39ebmr9064543e0c.9.1727762236815; Mon, 30 Sep 2024 22:57:16 -0700 (PDT) MIME-Version: 1.0 References: <3133182a-35c4-4a81-be60-8ab227c0fbc0@aklaver.com> In-Reply-To: <3133182a-35c4-4a81-be60-8ab227c0fbc0@aklaver.com> From: sud Date: Tue, 1 Oct 2024 11:27:05 +0530 Message-ID: Subject: Re: Question on session timeout To: Adrian Klaver Cc: pgsql-general Content-Type: multipart/alternative; boundary="00000000000061c04d062363fe5f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000061c04d062363fe5f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Oct 1, 2024 at 4:10=E2=80=AFAM Adrian Klaver wrote: > On 9/30/24 13:01, sud wrote: > > Hello, > > We are frequently seeing the total DB connection reaching ~2000+ wherea= s > > Where are you getting the ~2000 count from? > > > the total number of active sessions in pg_stat_activity staying <100 at > > any point in time. And when we see the sessions from DB side they are > > showing most of the sessions with state as 'idle' having > > backend_start/xact_start showing date ~10days older. We do use > > application level connection pooling, and we have ~120 sets as both the > > What do you mean by ~120 sets, in particular what is a set? > > > "max idle" and "max active" connection count and "maxage" as 7 days, so > > does this suggest any issue at connection pool setup? > > Using what pooler? > > > > > We do see keep alive queries in the DB (select 1), not sure if that is > > making this scenario. When checking the > > How often do to keep alive queries run? > > > "idle_in_transaction_session_timeout" it is set as 24hours and > > "idle_session_timeout" set as "0". So my question is , should we set th= e > > parameter to a lesser value in DB cluster level like ~5minutes or so, s= o > > as not to keep the idle sessions lying so long in the database and what > > '"idle_in_transaction_session_timeout" it is set as 24hours' is a foot > gun as explained here: > > https://www.postgresql.org/docs/current/runtime-config-client.html > > idle_in_transaction_session_timeout (integer) > > [...] > > "This option can be used to ensure that idle sessions do not hold locks > for an unreasonable amount of time. Even when no significant locks are > held, an open transaction prevents vacuuming away recently-dead tuples > that may be visible only to this transaction; so remaining idle for a > long time can contribute to table bloat. See Section 24.1 for more > details." > > > With '"idle_session_timeout" set as "0"' a session without an open > transaction is not going to timeout. > > > *Where are you getting the ~2000 count from?* Seeing this in the "performance insights" dashboard and also its matching when I query the count of sessions from pg_stat_activity. *What do you mean by ~120 sets, in particular what is a set?*These are the values set as mentioned in the properties file which the application team uses for connection pooling. *Using what pooler?*I need to check on this as Its Java application(jdbc driver for connecting to DB), so I thought it must be using standard connection pooling. Will double check. *How often do to keep alive queries run?*Need to check. But I am not sure, in general , if these "keep alive" queries are used for keeping a transaction alive or a session alive? As you described, a long time open transaction with a session state as "idle" will be threatening as that will cause locking and "transaction ID wrap around" issues to surface whereas having "idle sessions" of a closed transaction may not cause any issue as they will do no harm. Does it mean we can have any number of idle sessions or we should also have some non zero "timeout" setup for the "ide_session_timeout" parameter too (maybe ~1hr or so)? Is it correct to assume the session in pg_stat_activity with very old XACT_START are the one which are part of long running open transaction(i.e. driven by idle_in_transaction_session_timeout) whereas the ones with older BACKEND_START or QUERY_START are the one are just the idle session(driven by idle_session_timeout) but not tied to any open transaction? Few observations:- I do see, "MaximumUsedTransactionIDs" staying consistently ~200M for a long time then coming down. And its matching to "autovacuum_freeze_max_age" which is set as 200M. Hope it's expected. We have max_connections set as 5000. "Database connection" touching ~2000 then coming down till 200. And we see uneven spikes in those, it seems to be matching with the pattern , when we have some errors occurring during the insert queries which are submitted by the Java application to insert the data into the tables. --00000000000061c04d062363fe5f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Oct 1, 2024 at 4:10=E2=80=AFA= M Adrian Klaver <adrian.kla= ver@aklaver.com> wrote:
On 9/30/24 13:01, sud wrote:
> Hello,
> We are frequently seeing the total DB connection reaching ~2000+ where= as

Where are you getting the ~2000=C2=A0 count from?

> the total number of active sessions in pg_stat_activity staying <10= 0 at
> any point in time. And when we see the sessions from DB side they are =
> showing most of the sessions with state as 'idle' having
> backend_start/xact_start showing date ~10days older. We do use
> application level connection pooling, and we have ~120 sets as both th= e

What do you mean by ~120 sets, in particular what is a set?

> "max idle" and "max active" connection count and &= quot;maxage" as 7 days, so
> does this suggest any issue at connection pool setup?

Using what pooler?

>
> We do see keep alive queries in the DB (select 1), not sure if that is=
> making this scenario. When checking the

How often do to keep alive queries run?

> "idle_in_transaction_session_timeout" it is set as 24hours a= nd
> "idle_session_timeout" set as "0". So my question = is , should we set the
> parameter to a lesser value in DB cluster level like ~5minutes or so, = so
> as not to keep the idle sessions lying so long in the database and wha= t

'"idle_in_transaction_session_timeout" it is set as 24hours&#= 39; is a foot
gun as explained here:

https://www.postgresql.org/docs/cu= rrent/runtime-config-client.html

idle_in_transaction_session_timeout (integer)

[...]

"This option can be used to ensure that idle sessions do not hold lock= s
for an unreasonable amount of time. Even when no significant locks are
held, an open transaction prevents vacuuming away recently-dead tuples
that may be visible only to this transaction; so remaining idle for a
long time can contribute to table bloat. See Section 24.1 for more details.= "


With '"idle_session_timeout" set as "0"' a sess= ion without an open
transaction is not going to timeout.




Where are you getting the ~2000 = =C2=A0count from?
Seeing this in the "performance insights"= ; dashboard and also its matching when I query the count of sessions from p= g_stat_activity.

What do you mean by ~120 sets, in particular wha= t is a set?
These are the values set as mentioned in the properties = file which the application team uses for connection pooling.

Usin= g what pooler?
I need to check on this as Its Java application(jdbc = driver for connecting to DB), so I thought it must be using standard connec= tion pooling. Will double check.

How often do to keep alive queri= es run?
Need to check. But I am not sure, in general , if these &quo= t;keep alive" queries are used for keeping a transaction alive or a se= ssion alive?

As you described, a long time open transaction with a s= ession state as "idle" will be threatening as that will cause loc= king and "transaction ID wrap around" issues to surface whereas h= aving "idle sessions" of a closed transaction may not cause any i= ssue as they will do no harm. Does it mean we can have any number of idle s= essions or we should also have some non zero "timeout" setup for = the "ide_session_timeout" parameter too (maybe=C2=A0~1hr or so)?<= br>
Is it correct to assume the session in pg_stat_activity with very ol= d XACT_START are the one which are part of long running open transaction(i.= e. driven by idle_in_transaction_session_timeout) whereas the ones with old= er BACKEND_START or QUERY_START are the one are just the idle session(drive= n by idle_session_timeout) but not tied to any open transaction?

Few= observations:-

I do see, "Maximum= UsedTransactionIDs" staying consistently ~200M for a long time then=C2= =A0coming down. And its matching to "autovacuum_freeze_max_age" w= hich is set as 200M. Hope it's expected. We have max_connections set as= 5000.

"Database connection" touching ~2000 then coming d= own till 200. And we see uneven spikes in those, it seems to be matching wi= th the pattern , when we have some errors occurring during the insert queri= es which are submitted by the Java application to insert the data into the = tables.

=C2=A0
--00000000000061c04d062363fe5f--