Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uBvII-008HcA-Aw for pgsql-general@arkaria.postgresql.org; Mon, 05 May 2025 12:51:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uBvIG-003P8g-Lk for pgsql-general@arkaria.postgresql.org; Mon, 05 May 2025 12:51:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uBvIG-003P8Y-6R for pgsql-general@lists.postgresql.org; Mon, 05 May 2025 12:51:36 +0000 Received: from mail-pl1-x632.google.com ([2607:f8b0:4864:20::632]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uBvID-000HCo-1A for pgsql-general@lists.postgresql.org; Mon, 05 May 2025 12:51:35 +0000 Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-2255003f4c6so42785505ad.0 for ; Mon, 05 May 2025 05:51:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kset.org; s=google; t=1746449491; x=1747054291; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fx5ZBgCjNa40ykxt2DyXE00t0le11KfNQl4y6U+cLqg=; b=EIfvWkwV9bJTSuxIv61YcNAKQYtemlr8tr8ih8B1+8L4QXWWSJwazXH6+Kj8ctwTHR sUiOCpdRseYg0fjqXNvzFm0nFKZOcK5nA67rQRSHEQdumS47VUPfX1vRA8/7Ae+cLlFm 9+zGdtLK1Dtih5jM5jWIJFtjj/oTOrGtDuVhy6T5JeJhW4pZXEpZE3wIOaGLO5t5MkkJ Ut8mSNzwc67MYuE0qCmE7sdiowj3WwvNUSxukoKaUmiwq6LnhW3rwpmK+fCEfkz328jw oQdlvnJf3EK+/r8Xq0Yz7jKo7K86rOhSKkSJw8Aq6n6z1ffPl5/n0bd+zw4SnVOjDqD6 9rCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746449491; x=1747054291; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fx5ZBgCjNa40ykxt2DyXE00t0le11KfNQl4y6U+cLqg=; b=weKrk3HOTaZnXjBICY1tK3b55Z3wzfvdoi2wYfDE9zYJZoML2S6oJFjgXLAx3AkX+1 nhJMdjfSvG/eb+IIEH1hcHgh4KIR+lkF0wgj8yxyhi9ZMFc0mL65ofqHWx/dzKGfDdF8 e6ztKMLGHKb/vh3mctd+oGjWgo4HNmXs1gppxq4MD+GCcuPjviyNOYCA+9d1L6EkW+d+ Wk7VRjyIOzGqRYd8kuyXSB4YR1z7/NHkausfA6VNxyzdiDPmJSn8Ax0rgs8U35rc06PS LwzBYJVRj73Ho+dls+mupE304AeXVRuwgaARFOU0bS+AoRD6rmVC8vRCk1N9Wu4wTWWU Nhyw== X-Gm-Message-State: AOJu0Yx7s2F8FKfqTpgCZfpemaeBwlFqPhEw5UN8vgLiqExtmClWCBJL 5fjlj9uHK69SnPA1RKFhiporXAQZwv5L835NJoxwUAkQvvbcWTLU/OQ1wESJqYNRW85I1AxzGLL acCvvQKgWGhx0jalrHVik2mD5X22sv7gln3/SEQ== X-Gm-Gg: ASbGncvgl3VDGEMJZlFXrQmqcEpAMAPZIP2nKNtt5SZ+MbVe1oSke3frKbvbbOBAmQ7 CncG9Vbg3E59Gkx15thF7zzGykaFXsv7sqpUyX8PxsS8sRFr8cWRIQepq6JAmkBRXQznHvQixHd tGw2Cg/4pIsZ6vzNudELhLOA== X-Google-Smtp-Source: AGHT+IGT9d/kiLvrl/V6Ax5TmTapPC8moP07UgljE89TW8tEGR38Q2I/k5D3Jq/4TE2xgIY85ld7whR0VVIEIXQh34s= X-Received: by 2002:a17:902:d487:b0:22d:c52e:2dae with SMTP id d9443c01a7336-22e102e0ff4mr166044055ad.18.1746449491585; Mon, 05 May 2025 05:51:31 -0700 (PDT) MIME-Version: 1.0 References: <4ec38b6d-bd92-4055-8d2f-7efa583a2b9f@cloud.gatewaynet.com> In-Reply-To: From: =?UTF-8?Q?Mladen_Marinovi=C4=87?= Date: Mon, 5 May 2025 14:51:20 +0200 X-Gm-Features: ATxdqUGQY75BrEft6o8iiGZA3-oHaz-JLw0PVF-_8QWooBJeXiFCy95olrcLNmI Message-ID: Subject: Re: Different execution plans in PG17 and pgBouncer... To: Achilleas Mantzios Cc: "pgsql-general@lists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000907ffb063462f57c" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000907ffb063462f57c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, May 5, 2025 at 2:36=E2=80=AFPM Achilleas Mantzios < a.mantzios@cloud.gatewaynet.com> wrote: > > On 5/5/25 13:27, Mladen Marinovi=C4=87 wrote: > > > > On Mon, May 5, 2025 at 12:07=E2=80=AFPM Achilleas Mantzios < > a.mantzios@cloud.gatewaynet.com> wrote: > >> >> On 5/5/25 11:00, Mladen Marinovi=C4=87 wrote: >> >> >> >> On Mon, May 5, 2025 at 11:24=E2=80=AFAM Achilleas Mantzios < >> a.mantzios@cloud.gatewaynet.com> wrote: >> >>> >>> On 5/5/25 09:52, Mladen Marinovi=C4=87 wrote: >>> >>> Hi, >>> >>> We recently migrated our production instances from PG11 to PG17. While >>> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As >>> everything worked on the test servers we pushed this to production a fe= w >>> weeks ago. We did not notice any problems until a few days ago (but the >>> problems were here from the start). The main manifestation of the probl= ems >>> is a service that runs a fixed query to get a backlog of unprocessed da= ta >>> (limited to a 1000 rows). When testing the query using pgAdmin connecte= d >>> directly to the database we get a result in cca. 20 seconds. The same q= uery >>> runs for 2 hours when using pgBouncer to connect to the same database. >>> >>> >>> That's a huge jump, I hope you guys did extensive testing of your app. >>> In which language is your app written? If java, then define >>> prepareThreshold=3D0 in your jdbc and set max_prepared_statements =3D 0= in >>> pgbouncer. >>> >> Mainly python, but the problem was noticed in a java service. >> Prepare treshold was already set to 0. We changed the max_prepared_stat= ements >> to 0 from the default (200) but no change was noticed. >> >>> How about search paths ? any difference on those between the two runs ? >>> Do you set search_path in pgbouncer ? what is "cca." btw ? >>> >>> >>> The more interesting part is that when we issue an explain of the same >>> query we get different plans. We did this a few seconds apart so there >>> should be no difference in collected statistics. We ruled out prepared >>> statements, as we suspected the generic plan might be the problem, but = it >>> is not. Is there any pgBouncer or PG17 parameter that might be the caus= e of >>> this? >>> >>> >>> Does this spawn any connections (such as dblink) ? are there limits per >>> user/db pool_size in pgbouncer ? >>> >> No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT= , >> UPDATE, DELETE,...) There are limits, but they are not hit. The query ju= st >> uses a different plan and runs slower because of that. >> >>> Pgbouncer, in contrast to its old friend PgPool-II is completely >>> passive, just passes through SQL to the server as fast as possible as i= t >>> can. But I am sure you know that. Good luck, keep us posted! >>> >> Yes, that is what puzzles me. >> >> What is the pgbouncer's timeout in the server connections ? >> >> How about "idle in transaction" ? do you get any of those? What's the >> isolation level ? >> >> How about the user ? is this the same user doing pgadmin queries VS via >> the app ? >> >> Can you identify the user under which the problem is manifested and : >> >> ALTER user "unlucky_user" SET log_statement =3D 'all'; >> >> ALTER user "unlucky_user" SET log_min_duration_statement =3D 0; -- to he= lp >> you debug the prepared statements .. just in case , and other stuff not >> printed by log_statement =3D all. >> > None of those parameters should affect the fact that when issuing the > explain select query (the statement is not prepared) from psql directly > gives a different result than issuing it over the pgbouncer connection. T= he > result is repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the problem persists. > This is one of the weirdest things I have ever seen with PostgreSQL. > > > ok, this is something, at least one more extreme thought ruled out. How > about search_path ? is this the SAME user that is issuing the statements > in pgadmin VS pgbouncer ? > The user is the same, the search path is the same: show search_path ; search_path ----------------- "$user", public (1 row) There is only one schema (public) so there is no posibility that one connection uses different tables. As I remember correctly the planner uses the postgresql parameters (from the conf file visible in pg_settings) and table/row statistics to choose a plan. It is unclear to me how those numbers can be different when using pgbouncer (I have tried selecting the configuration using both connections and it is the same). > Is there a connect_query inside pgbouncer's conf ? > The connect_query parameter is not used. > you have to show all configuration involved and also full logging on the > backend for said user. > I can provide redacted explains if it would help. Full logging is not feasible. > > >> Regards, >> Mladen Marinovi=C4=87 >> >> --000000000000907ffb063462f57c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Mon, May 5, = 2025 at 2:36=E2=80=AFPM Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com> wrote:
=20 =20 =20


On 5/5/25 13:27, Mladen Marinovi=C4=87 wrote:
=20


On Mon, May 5, 2025 at 12:07=E2=80=AFPM Achilleas Mantzios <a.mantzios@cloud.gatewaynet.c= om> wrote:


On 5/5/25 11:00, Mladen Marinovi=C4=87 wrote:


On Mon, May 5, 2025 at 11:24=E2=80=AFAM Achilleas Mantzios <a.mantzios@= cloud.gatewaynet.com> wrote:


On 5/5/25 09:52, Mladen Marinovi=C4=87 wrote:<= br>
Hi,

We recently migrated our production instances from PG11 to PG17. While doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on the test servers we pushed this to production a few weeks ago. We did not notice any problems until a few days ago (but the problems were here from the start). The main manifestation of the problems is a service that runs a fixed query to get a backlog of unprocessed data (limited to a 1000 rows). When testing the query using pgAdmin connected directly to the database we get a result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer to connect to the same database.


That's a huge jump, I hope you guys did extensive testing of your app. In which language is your app written? If java, then define prepareThreshold=3D0 in your jdbc and set max_prepared_statemen= ts =3D 0 in pgbouncer.

Mainly python, but the problem was noticed in a java service.
Prepare treshold was already set to 0. We changed the=C2=A0 max_p= r= epared_statements to 0 from the default (200) but no change was noticed.

How about search paths ? any difference on those between the two runs ? Do you set search_path in pgbouncer ? what is "cca.&quo= t; btw ?


The more interesting part is that when we issue an explain of the same query we get different plans. We did this a few seconds apart so there should be no difference in collected statistics. We ruled out prepared statements, as we suspected the generic plan might be the problem, but it is not. Is there any pgBouncer or PG17 parameter that might be the cause of this?


Does this spawn any connections (such as dblink) ? are there limits per user/db pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses a different plan and runs slower because of that.

Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just passes through SQL to the server as fast as possible as it can. But I am sure you know that. Good luck, keep us posted!

Yes, that is what puzzles me.

What is the pgbouncer's timeout in the server connections ?

How about "idle in transaction" ? do you get any= of those? What's the isolation level ?

How about the user ? is this the same user doing pgadmin queries VS via the app ?

Can you identify the user under which the problem is manifested and :

ALTER user "unlucky_user" SET log_statement =3D = 'all';

ALTER user "unlucky_user" SET log_min_duration_statement =3D 0; -- to help you debug the prepared statements .. just in case , and other stuff not printed by log_statement =3D all.

None of those parameters should affect the fact that when issuing the explain select query (the statement is not prepared) from psql directly gives a different result than issuing it over the pgbouncer connection. The result is repeatable.

We have rolled back pgbouncer to 1.12. and it seems the problem persists. This is one of the weirdest things I have ever seen with PostgreSQL.


ok, this is something, at least one more extreme thought ruled out. How about search_path ?=C2=A0 is this the SAME user that is issuing the statements in pgadmin VS pgbouncer ?

The user is the same, the search path is the same:
show se= arch_path ;
=C2=A0 =C2=A0search_path =C2=A0
-----------------
=C2= =A0"$user", public
(1 row)

There = is only one schema (public) so there is no posibility that one connection u= ses different tables. As I remember correctly the planner uses the postgres= ql parameters (from the conf file visible in pg_settings) and table/row sta= tistics to choose a plan. It is unclear to me how those numbers can be diff= erent when using pgbouncer (I have tried selecting the configuration using = both connections and it is the same).
=C2=A0

Is there a connect_query inside pgbouncer's conf ?

<= div>The=20 connect_query parameter is not used.

you have to show all configuration involved and also full logging on the backend for said user.

I can pr= ovide redacted explains if it would help. Full logging is not feasible.


=C2=A0
Regards,
Mladen Marinovi=C4=87
--000000000000907ffb063462f57c--