Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uBvq9-008OVU-Qw for pgsql-general@arkaria.postgresql.org; Mon, 05 May 2025 13:26:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uBvq8-003eH5-KB for pgsql-general@arkaria.postgresql.org; Mon, 05 May 2025 13:26:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uBvq8-003eGx-4g for pgsql-general@lists.postgresql.org; Mon, 05 May 2025 13:26:36 +0000 Received: from mail-pf1-x42b.google.com ([2607:f8b0:4864:20::42b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uBvq5-000HT1-03 for pgsql-general@lists.postgresql.org; Mon, 05 May 2025 13:26:35 +0000 Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-7398d65476eso3449338b3a.1 for ; Mon, 05 May 2025 06:26:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kset.org; s=google; t=1746451591; x=1747056391; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=x+PRYOwtGGI2VCJ+ZRElyfMk1CW/EKThNxP9ZdoyV/8=; b=ZXWA6mgLccTng458HkGIRdgw2LAT3pBlYvvSmgCEo1s1on4DA0/2V4RJEJG/Q4Rp/Z PW46GIEADOI0JE/HkVm5myEsgpibmWN1vSaJIKQ2CrypKFG3V7cpCIMtbetOQHCLV7qB KbAafSbYsKQxvid4GchwWna1lBcnuhLS0LD+wbEety2kzyXAHGg7sVMivYw6CCXCRe1v 0bIn+PIYzvFRtz8A/0yxqH8IbjyDTJkn64poRBwUzTJZvGaMINt68RTgF9ZZUWpgyK2u z85cz0+7FtPgaJ+FbwaxXkNU4MGQ5eu8VFofLdpTATJh0BqiLcq3szvGLy/qGajp2obI 2M5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746451591; x=1747056391; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=x+PRYOwtGGI2VCJ+ZRElyfMk1CW/EKThNxP9ZdoyV/8=; b=YYZh7bIRZvXbnh07VLzD46NC21tP8MBEE5W67pyB7Qp40f/hFQk1A7/6sX+OgnpBzC IUC0Iej2xERpSP4tzKmqGtexnR9Mgpx3kxcYHFPLTBKhxhT/U79c2k4k35ZdupObwV8E bUxcWYN6QkcB6gmrBp4AQRbuBtQGoPGC54iYvel4dwXbSIlBBGkmBOCjzHZ6PDimSWPD j5MIbuhL0ousm1QuTWjjYpXVl6BNNeQuoNpuKFTkZgmS01VGxP+qSi88+5CG1n6HxVxl TpDIxCx8JZAkGhSWWjXSaa28/3pWynmCVWZu6+rXSXAgCq6/vfq9+29F8HzIXj5HQASW 2v5Q== X-Forwarded-Encrypted: i=1; AJvYcCVcN+vrL4wA6JyT44m3p9PUU4z1J+hhiE1KphhHUQiSqDCk/arzqhLBtLAkHbrvrRfAAHVj81ltTrfuMumn@lists.postgresql.org X-Gm-Message-State: AOJu0YwNHieSGGoKpprAMYk1dq9hMcYYvs1qtYyy0UrWOk556ujcUkFB eRgpAhMLrQHvYZn74+b+8UsK/3alIc23iFPb7Pk1fgCPaJ1EiXlb4DJl2LF57EPZn0qKO4McaeL NSQizXC45h6N5skz+rbxfUNSjKFQr4CnT+gnYEg== X-Gm-Gg: ASbGnctOJaK7paGzQZ0+21W6+0D6C+O5bY7Iekz4+9E2tS/L1p7Hc44iVOKXleB6wDn hys7Vd6pyl7QTg81SFZW6L1KUn6RDAlJgJ/rFSFk7O+RvYM8kJ5+X0tg4v6TV7E7nBHtcy0taAx 1g2a7+lQiDRnfLoI2nX2u7eQ== X-Google-Smtp-Source: AGHT+IGWwdmb1Lhk/9h5ldwj6B7RjjxRRqpEJsARXSoX31c4YCF0kxs/SLodAQKBNmwVDYNpTMEqDHx9e7rzQ2WpkN8= X-Received: by 2002:a05:6a00:4c17:b0:73d:f9d2:9c64 with SMTP id d2e1a72fcca58-74057c54811mr18515348b3a.10.1746451591159; Mon, 05 May 2025 06:26:31 -0700 (PDT) MIME-Version: 1.0 References: <4ec38b6d-bd92-4055-8d2f-7efa583a2b9f@cloud.gatewaynet.com> <140687619.1448875.1746450591896@mail.yahoo.com> In-Reply-To: <140687619.1448875.1746450591896@mail.yahoo.com> From: =?UTF-8?Q?Mladen_Marinovi=C4=87?= Date: Mon, 5 May 2025 15:26:19 +0200 X-Gm-Features: ATxdqUGvOS0knyOM-gB2kZIf9NZnMKwNW3PfIT_Esn4rnx9VgsCx-9dT9e7JKN4 Message-ID: Subject: Re: Different execution plans in PG17 and pgBouncer... To: "Efrain J. Berdecia" Cc: SERHAD ERDEM , Achilleas Mantzios , "pgsql-general@lists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000b572fa0634637235" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000b572fa0634637235 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Mystery not solved...but identified. The pool is in transaction mode and some connections use set enable_mergejoin=3Doff, but they do not set it bac= k to on. Upon getting the connection from the pool the parameter is still set to off causing the planner to not use this kind of join which results in different plans when using this tainted pgbouncer connection instead of the clean one from pg17. The problem is that server_reset_query is not used when the pool is in transaction mode. Now, we have to see how to fix this problem. Regards, Mladen Marinovi=C4=87 On Mon, May 5, 2025 at 3:10=E2=80=AFPM Efrain J. Berdecia wrote: > Is the query using parameter markers? Is the source executing the query > forcing a "bad" data type casting? > > Yahoo Mail: Search, Organize, Conquer > > > On Mon, May 5, 2025 at 8:52 AM, Mladen Marinovi=C4=87 > wrote: > > > On Mon, May 5, 2025 at 2:38=E2=80=AFPM SERHAD ERDEM = wrote: > > Hi , you had better try vacuum analyze for the whole db , pgbouncer > connection layer can not causeslow queries. > > > I did that already. But the slow query is the consequence of the differen= t > plan, not the statistics. > > > ------------------------------ > *From:* Mladen Marinovi=C4=87 > *Sent:* Monday, May 5, 2025 12:27 PM > *To:* Achilleas Mantzios > *Cc:* pgsql-general@lists.postgresql.org < > pgsql-general@lists.postgresql.org> > *Subject:* Re: Different execution plans in PG17 and pgBouncer... > > > > On Mon, May 5, 2025 at 12:07=E2=80=AFPM Achilleas Mantzios < > a.mantzios@cloud.gatewaynet.com> wrote: > > > On 5/5/25 11:00, Mladen Marinovi=C4=87 wrote: > > > > On Mon, May 5, 2025 at 11:24=E2=80=AFAM Achilleas Mantzios < > a.mantzios@cloud.gatewaynet.com> wrote: > > > On 5/5/25 09:52, Mladen Marinovi=C4=87 wrote: > > Hi, > > We recently migrated our production instances from PG11 to PG17. While > doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As > everything worked on the test servers we pushed this to production a few > weeks ago. We did not notice any problems until a few days ago (but the > problems were here from the start). The main manifestation of the problem= s > is a service that runs a fixed query to get a backlog of unprocessed data > (limited to a 1000 rows). When testing the query using pgAdmin connected > directly to the database we get a result in cca. 20 seconds. The same que= ry > runs for 2 hours when using pgBouncer to connect to the same database. > > > That's a huge jump, I hope you guys did extensive testing of your app. In > which language is your app written? If java, then define prepareThreshold= =3D0 > in your jdbc and set max_prepared_statements =3D 0 in pgbouncer. > > Mainly python, but the problem was noticed in a java service. > Prepare treshold was already set to 0. We changed the max_prepared_state= ments > to 0 from the default (200) but no change was noticed. > > How about search paths ? any difference on those between the two runs ? D= o > you set search_path in pgbouncer ? what is "cca." btw ? > > > The more interesting part is that when we issue an explain of the same > query we get different plans. We did this a few seconds apart so there > should be no difference in collected statistics. We ruled out prepared > statements, as we suspected the generic plan might be the problem, but it > is not. Is there any pgBouncer or PG17 parameter that might be the cause = of > this? > > > Does this spawn any connections (such as dblink) ? are there limits per > user/db pool_size in pgbouncer ? > > No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, > UPDATE, DELETE,...) There are limits, but they are not hit. The query jus= t > uses a different plan and runs slower because of that. > > Pgbouncer, in contrast to its old friend PgPool-II is completely passive, > just passes through SQL to the server as fast as possible as it can. But = I > am sure you know that. Good luck, keep us posted! > > Yes, that is what puzzles me. > > What is the pgbouncer's timeout in the server connections ? > > How about "idle in transaction" ? do you get any of those? What's the > isolation level ? > > How about the user ? is this the same user doing pgadmin queries VS via > the app ? > > Can you identify the user under which the problem is manifested and : > > ALTER user "unlucky_user" SET log_statement =3D 'all'; > > ALTER user "unlucky_user" SET log_min_duration_statement =3D 0; -- to hel= p > you debug the prepared statements .. just in case , and other stuff not > printed by log_statement =3D all. > > None of those parameters should affect the fact that when issuing the > explain select query (the statement is not prepared) from psql directly > gives a different result than issuing it over the pgbouncer connection. T= he > result is repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the problem persists. > This is one of the weirdest things I have ever seen with PostgreSQL. > > > Regards, > Mladen Marinovi=C4=87 > > --000000000000b572fa0634637235 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Mystery not solved...but= identified. The pool is in transaction mode and some connections use set e= nable_mergejoin=3Doff, but they do not set it back to on. Upon getting the = connection from the pool the parameter is still set to off causing the plan= ner to not use this kind of join which results in different plans when usin= g this tainted pgbouncer connection instead of the clean one from pg17.

The problem is that=20 server_reset_query is not used when the pool is in transaction mode. Now, w= e have to see how to fix this problem.

Regards,
Mladen Marinovi=C4=87

On Mon, May 5, 202= 5 at 3:10=E2=80=AFPM Efrain J. Berdecia <ejberdecia@yahoo.com> wrote:
Is the query using parameter markers? Is the s= ource executing the query forcing a "bad" data type casting?
<= br>
On Mon, May 5, 2025 at 8:52 AM, Mladen Marinovi=C4=87
<marin@kset.org> wrote:


Hi=C2=A0 , you had better try=C2=A0 vacuum=C2=A0 analyze for the whole db ,= =C2=A0=C2=A0 pgbouncer=C2=A0 connection layer can not causeslow queries.

I did that alre= ady. But the slow query is the consequence of the different plan, not the s= tatistics.
<= div>=C2=A0

From:=C2=A0Mladen Marinovi=C4=87 <marin@kset.org>
Sent:=C2=A0Monday, May 5, 2025 12:27 PM
To:=C2=A0Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com>
Cc:=C2=A0pgsql-gene= ral@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject:=C2=A0Re: Different execution plans in PG17 and pgBouncer...=
=C2=A0


On Mon, May 5, 2025 at 12:07=E2=80=AFPM Achill= eas Mantzios <a.mantzios@cloud.gatewaynet.com> wrote:


On 5/5/25 11:00, Mladen Marinovi=C4=87 wrote:<= /div>


On Mon, May 5, 2025 at 11:24=E2=80=AFAM Achill= eas Mantzios <a.mantzios@cloud.gatewaynet.com> wrote:


On 5/5/25 09:52, Mladen Marinovi=C4=87 wrote:<= /div>
Hi,

We recently migrated our production instances = from PG11 to PG17. While doing so we upgraded our pgBouncer instances from = 1.12 to 1.24. As everything worked on the test servers we pushed this to pr= oduction a few weeks ago. We did not notice any problems until a few days ago (but the problems were here f= rom the start). The main manifestation of the problems is a service that ru= ns a fixed query to get a backlog of unprocessed data (limited to a 1000 ro= ws). When testing the query using pgAdmin connected directly to the database we get a result in cca. 20 seco= nds. The same query runs for 2 hours when using pgBouncer to connect to the= same database.


That's a huge jump, I hope you guys did exte= nsive testing of your app. In which language is your app written? If java, = then define prepareThreshold=3D0 in your jdbc and set max_prepared_statements =3D 0 in pgbouncer.

Mainly python, but the problem was noticed in = a java service.
Prepare treshold was already set to 0. We chan= ged the=C2=A0 max_prepared_statements to 0 from the default (200) but no change was noticed.

How about search paths ? any difference on those= between the two runs ? Do you set search_path in pgbouncer ? what is "= ;cca." btw ?


The more interesting part is that when we issu= e an explain of the same query we get different plans. We did this a few se= conds apart so there should be no difference in collected statistics. We ru= led out prepared statements, as we suspected the generic plan might be the problem, but it is not. Is ther= e any pgBouncer or PG17 parameter that might be the cause of this?


Does this spawn any connections (such as dblink)= ? are there limits per user/db pool_size in pgbouncer ?

No additional connection nor dbling. Just plai= n SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...) There are limits, but they = are not hit. The query just uses a different plan and runs slower because o= f that.

Pgbouncer, in contrast to its old friend PgPool-= II is completely passive, just passes through SQL to the server as fast as = possible as it can. But I am sure you know that. Good luck, keep us posted!=

Yes, that is what puzzles me.

What is the pgbouncer's timeout in the serve= r connections ?

How about "idle in transaction" ? do y= ou get any of those? What's the isolation level ?

How about the user ? is this the same user doing= pgadmin queries VS via the app ?

Can you identify the user under which the proble= m is manifested and :

ALTER user "unlucky_user" SET log_stat= ement =3D 'all';

ALTER user "unlucky_user" SET log_min_= duration_statement =3D 0; -- to help you debug the prepared statements .. j= ust in case , and other stuff not printed by log_statement =3D all.

None of those parameters should affect the fac= t that when issuing the explain select query (the statement is not prepared= ) from psql directly gives a different result than issuing it over the pgbo= uncer connection. The result is repeatable.

We have rolled back pgbouncer to 1.12. and it = seems the problem persists. This is one of the weirdest things I have ever = seen with PostgreSQL.
=C2=A0
Regards,
Mladen Marinovi=C4=87
--000000000000b572fa0634637235--