public inbox for [email protected]help / color / mirror / Atom feed
Re: Different execution plans in PG17 and pgBouncer... 13+ messages / 4 participants [nested] [flat]
* Re: Different execution plans in PG17 and pgBouncer... @ 2025-05-05 09:24 Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 0 siblings, 1 reply; 13+ messages in thread From: Achilleas Mantzios @ 2025-05-05 09:24 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; pgsql-general On 5/5/25 09:52, Mladen Marinović wrote: > Hi, > > We recently migrated our production instances from PG11 to PG17. While > doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As > everything worked on the test servers we pushed this to production a > few weeks ago. We did not notice any problems until a few days ago > (but the problems were here from the start). The main manifestation of > the problems is a service that runs a fixed query to get a backlog of > unprocessed data (limited to a 1000 rows). When testing the query > using pgAdmin connected directly to the database we get a result in > cca. 20 seconds. The same query runs for 2 hours when using pgBouncer > to connect to the same database. That's a huge jump, I hope you guys did extensive testing of your app. In which language is your app written? If java, then define prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in pgbouncer. How about search paths ? any difference on those between the two runs ? Do you set search_path in pgbouncer ? what is "cca." btw ? > > The more interesting part is that when we issue an explain of the same > query we get different plans. We did this a few seconds apart so there > should be no difference in collected statistics. We ruled out prepared > statements, as we suspected the generic plan might be the problem, but > it is not. Is there any pgBouncer or PG17 parameter that might be the > cause of this? Does this spawn any connections (such as dblink) ? are there limits per user/db pool_size in pgbouncer ? Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just passes through SQL to the server as fast as possible as it can. But I am sure you know that. Good luck, keep us posted! > > Regards, > Mladen Marinović ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> @ 2025-05-05 10:07 ` Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:29 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 0 siblings, 2 replies; 13+ messages in thread From: Achilleas Mantzios @ 2025-05-05 10:07 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; [email protected] <[email protected]> On 5/5/25 11:00, Mladen Marinović wrote: > > > On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios > <[email protected]> wrote: > > > On 5/5/25 09:52, Mladen Marinović wrote: >> Hi, >> >> We recently migrated our production instances from PG11 to PG17. >> While doing so we upgraded our pgBouncer instances from 1.12 to >> 1.24. As everything worked on the test servers we pushed this to >> production a few weeks ago. We did not notice any problems until >> a few days ago (but the problems were here from the start). The >> main manifestation of the problems is a service that runs a fixed >> query to get a backlog of unprocessed data (limited to a 1000 >> rows). When testing the query using pgAdmin connected directly to >> the database we get a result in cca. 20 seconds. The same query >> runs for 2 hours when using pgBouncer to connect to the same >> database. > > > That's a huge jump, I hope you guys did extensive testing of your > app. In which language is your app written? If java, then define > prepareThreshold=0 in your jdbc and set max_prepared_statements = > 0 in pgbouncer. > > Mainly python, but the problem was noticed in a java service. > Prepare treshold was already set to 0. We changed the > max_prepared_statements to 0 from the default (200) but no change was > noticed. > > How about search paths ? any difference on those between the two > runs ? Do you set search_path in pgbouncer ? what is "cca." btw ? > >> >> The more interesting part is that when we issue an explain of the >> same query we get different plans. We did this a few seconds >> apart so there should be no difference in collected statistics. >> We ruled out prepared statements, as we suspected the generic >> plan might be the problem, but it is not. Is there any pgBouncer >> or PG17 parameter that might be the cause of this? > > > Does this spawn any connections (such as dblink) ? are there > limits per user/db pool_size in pgbouncer ? > > No additional connection nor dbling. Just plain SQL (CTE, SELECT, > INSERT, UPDATE, DELETE,...) There are limits, but they are not hit. > The query just uses a different plan and runs slower because of that. > > Pgbouncer, in contrast to its old friend PgPool-II is completely > passive, just passes through SQL to the server as fast as possible > as it can. But I am sure you know that. Good luck, keep us posted! > > Yes, that is what puzzles me. What is the pgbouncer's timeout in the server connections ? How about "idle in transaction" ? do you get any of those? What's the isolation level ? How about the user ? is this the same user doing pgadmin queries VS via the app ? Can you identify the user under which the problem is manifested and : ALTER user "unlucky_user" SET log_statement = 'all'; ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you debug the prepared statements .. just in case , and other stuff not printed by log_statement = all. > Regards, > Mladen Marinović ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> @ 2025-05-05 12:27 ` Mladen Marinović <[email protected]> 2025-05-05 12:36 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:38 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 1 sibling, 2 replies; 13+ messages in thread From: Mladen Marinović @ 2025-05-05 12:27 UTC (permalink / raw) To: Achilleas Mantzios <[email protected]>; +Cc: [email protected] <[email protected]> On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios < [email protected]> wrote: > > On 5/5/25 11:00, Mladen Marinović wrote: > > > > On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios < > [email protected]> wrote: > >> >> On 5/5/25 09:52, Mladen Marinović wrote: >> >> Hi, >> >> We recently migrated our production instances from PG11 to PG17. While >> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As >> everything worked on the test servers we pushed this to production a few >> weeks ago. We did not notice any problems until a few days ago (but the >> problems were here from the start). The main manifestation of the problems >> is a service that runs a fixed query to get a backlog of unprocessed data >> (limited to a 1000 rows). When testing the query using pgAdmin connected >> directly to the database we get a result in cca. 20 seconds. The same query >> runs for 2 hours when using pgBouncer to connect to the same database. >> >> >> That's a huge jump, I hope you guys did extensive testing of your app. In >> which language is your app written? If java, then define prepareThreshold=0 >> in your jdbc and set max_prepared_statements = 0 in pgbouncer. >> > Mainly python, but the problem was noticed in a java service. > Prepare treshold was already set to 0. We changed the max_prepared_statements > to 0 from the default (200) but no change was noticed. > >> How about search paths ? any difference on those between the two runs ? >> Do you set search_path in pgbouncer ? what is "cca." btw ? >> >> >> The more interesting part is that when we issue an explain of the same >> query we get different plans. We did this a few seconds apart so there >> should be no difference in collected statistics. We ruled out prepared >> statements, as we suspected the generic plan might be the problem, but it >> is not. Is there any pgBouncer or PG17 parameter that might be the cause of >> this? >> >> >> Does this spawn any connections (such as dblink) ? are there limits per >> user/db pool_size in pgbouncer ? >> > No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, > UPDATE, DELETE,...) There are limits, but they are not hit. The query just > uses a different plan and runs slower because of that. > >> Pgbouncer, in contrast to its old friend PgPool-II is completely passive, >> just passes through SQL to the server as fast as possible as it can. But I >> am sure you know that. Good luck, keep us posted! >> > Yes, that is what puzzles me. > > What is the pgbouncer's timeout in the server connections ? > > How about "idle in transaction" ? do you get any of those? What's the > isolation level ? > > How about the user ? is this the same user doing pgadmin queries VS via > the app ? > > Can you identify the user under which the problem is manifested and : > > ALTER user "unlucky_user" SET log_statement = 'all'; > > ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help > you debug the prepared statements .. just in case , and other stuff not > printed by log_statement = all. > > None of those parameters should affect the fact that when issuing the explain select query (the statement is not prepared) from psql directly gives a different result than issuing it over the pgbouncer connection. The result is repeatable. We have rolled back pgbouncer to 1.12. and it seems the problem persists. This is one of the weirdest things I have ever seen with PostgreSQL. > > Regards, > Mladen Marinović > > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> @ 2025-05-05 12:36 ` Achilleas Mantzios <[email protected]> 2025-05-05 12:51 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 1 sibling, 1 reply; 13+ messages in thread From: Achilleas Mantzios @ 2025-05-05 12:36 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; +Cc: [email protected] <[email protected]> On 5/5/25 13:27, Mladen Marinović wrote: > > > On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios > <[email protected]> wrote: > > > On 5/5/25 11:00, Mladen Marinović wrote: >> >> >> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios >> <[email protected]> wrote: >> >> >> On 5/5/25 09:52, Mladen Marinović wrote: >>> Hi, >>> >>> We recently migrated our production instances from PG11 to >>> PG17. While doing so we upgraded our pgBouncer instances >>> from 1.12 to 1.24. As everything worked on the test servers >>> we pushed this to production a few weeks ago. We did not >>> notice any problems until a few days ago (but the problems >>> were here from the start). The main manifestation of the >>> problems is a service that runs a fixed query to get a >>> backlog of unprocessed data (limited to a 1000 rows). When >>> testing the query using pgAdmin connected directly to the >>> database we get a result in cca. 20 seconds. The same query >>> runs for 2 hours when using pgBouncer to connect to the same >>> database. >> >> >> That's a huge jump, I hope you guys did extensive testing of >> your app. In which language is your app written? If java, >> then define prepareThreshold=0 in your jdbc and set >> max_prepared_statements = 0 in pgbouncer. >> >> Mainly python, but the problem was noticed in a java service. >> Prepare treshold was already set to 0. We changed the >> max_prepared_statements to 0 from the default (200) but no change >> was noticed. >> >> How about search paths ? any difference on those between the >> two runs ? Do you set search_path in pgbouncer ? what is >> "cca." btw ? >> >>> >>> The more interesting part is that when we issue an explain >>> of the same query we get different plans. We did this a few >>> seconds apart so there should be no difference in collected >>> statistics. We ruled out prepared statements, as we >>> suspected the generic plan might be the problem, but it is >>> not. Is there any pgBouncer or PG17 parameter that might be >>> the cause of this? >> >> >> Does this spawn any connections (such as dblink) ? are there >> limits per user/db pool_size in pgbouncer ? >> >> No additional connection nor dbling. Just plain SQL (CTE, SELECT, >> INSERT, UPDATE, DELETE,...) There are limits, but they are not >> hit. The query just uses a different plan and runs slower because >> of that. >> >> Pgbouncer, in contrast to its old friend PgPool-II is >> completely passive, just passes through SQL to the server as >> fast as possible as it can. But I am sure you know that. Good >> luck, keep us posted! >> >> Yes, that is what puzzles me. > > What is the pgbouncer's timeout in the server connections ? > > How about "idle in transaction" ? do you get any of those? What's > the isolation level ? > > How about the user ? is this the same user doing pgadmin queries > VS via the app ? > > Can you identify the user under which the problem is manifested and : > > ALTER user "unlucky_user" SET log_statement = 'all'; > > ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- > to help you debug the prepared statements .. just in case , and > other stuff not printed by log_statement = all. > > None of those parameters should affect the fact that when issuing the > explain select query (the statement is not prepared) from psql > directly gives a different result than issuing it over the pgbouncer > connection. The result is repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the problem > persists. This is one of the weirdest things I have ever seen with > PostgreSQL. ok, this is something, at least one more extreme thought ruled out. How about search_path ? is this the SAME user that is issuing the statements in pgadmin VS pgbouncer ? Is there a connect_query inside pgbouncer's conf ? you have to show all configuration involved and also full logging on the backend for said user. >> Regards, >> Mladen Marinović > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:36 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> @ 2025-05-05 12:51 ` Mladen Marinović <[email protected]> 0 siblings, 0 replies; 13+ messages in thread From: Mladen Marinović @ 2025-05-05 12:51 UTC (permalink / raw) To: Achilleas Mantzios <[email protected]>; +Cc: [email protected] <[email protected]> On Mon, May 5, 2025 at 2:36 PM Achilleas Mantzios < [email protected]> wrote: > > On 5/5/25 13:27, Mladen Marinović wrote: > > > > On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios < > [email protected]> wrote: > >> >> On 5/5/25 11:00, Mladen Marinović wrote: >> >> >> >> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios < >> [email protected]> wrote: >> >>> >>> On 5/5/25 09:52, Mladen Marinović wrote: >>> >>> Hi, >>> >>> We recently migrated our production instances from PG11 to PG17. While >>> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As >>> everything worked on the test servers we pushed this to production a few >>> weeks ago. We did not notice any problems until a few days ago (but the >>> problems were here from the start). The main manifestation of the problems >>> is a service that runs a fixed query to get a backlog of unprocessed data >>> (limited to a 1000 rows). When testing the query using pgAdmin connected >>> directly to the database we get a result in cca. 20 seconds. The same query >>> runs for 2 hours when using pgBouncer to connect to the same database. >>> >>> >>> That's a huge jump, I hope you guys did extensive testing of your app. >>> In which language is your app written? If java, then define >>> prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in >>> pgbouncer. >>> >> Mainly python, but the problem was noticed in a java service. >> Prepare treshold was already set to 0. We changed the max_prepared_statements >> to 0 from the default (200) but no change was noticed. >> >>> How about search paths ? any difference on those between the two runs ? >>> Do you set search_path in pgbouncer ? what is "cca." btw ? >>> >>> >>> The more interesting part is that when we issue an explain of the same >>> query we get different plans. We did this a few seconds apart so there >>> should be no difference in collected statistics. We ruled out prepared >>> statements, as we suspected the generic plan might be the problem, but it >>> is not. Is there any pgBouncer or PG17 parameter that might be the cause of >>> this? >>> >>> >>> Does this spawn any connections (such as dblink) ? are there limits per >>> user/db pool_size in pgbouncer ? >>> >> No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, >> UPDATE, DELETE,...) There are limits, but they are not hit. The query just >> uses a different plan and runs slower because of that. >> >>> Pgbouncer, in contrast to its old friend PgPool-II is completely >>> passive, just passes through SQL to the server as fast as possible as it >>> can. But I am sure you know that. Good luck, keep us posted! >>> >> Yes, that is what puzzles me. >> >> What is the pgbouncer's timeout in the server connections ? >> >> How about "idle in transaction" ? do you get any of those? What's the >> isolation level ? >> >> How about the user ? is this the same user doing pgadmin queries VS via >> the app ? >> >> Can you identify the user under which the problem is manifested and : >> >> ALTER user "unlucky_user" SET log_statement = 'all'; >> >> ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help >> you debug the prepared statements .. just in case , and other stuff not >> printed by log_statement = all. >> > None of those parameters should affect the fact that when issuing the > explain select query (the statement is not prepared) from psql directly > gives a different result than issuing it over the pgbouncer connection. The > result is repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the problem persists. > This is one of the weirdest things I have ever seen with PostgreSQL. > > > ok, this is something, at least one more extreme thought ruled out. How > about search_path ? is this the SAME user that is issuing the statements > in pgadmin VS pgbouncer ? > The user is the same, the search path is the same: show search_path ; search_path ----------------- "$user", public (1 row) There is only one schema (public) so there is no posibility that one connection uses different tables. As I remember correctly the planner uses the postgresql parameters (from the conf file visible in pg_settings) and table/row statistics to choose a plan. It is unclear to me how those numbers can be different when using pgbouncer (I have tried selecting the configuration using both connections and it is the same). > Is there a connect_query inside pgbouncer's conf ? > The connect_query parameter is not used. > you have to show all configuration involved and also full logging on the > backend for said user. > I can provide redacted explains if it would help. Full logging is not feasible. > > >> Regards, >> Mladen Marinović >> >> ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> @ 2025-05-05 12:38 ` SERHAD ERDEM <[email protected]> 2025-05-05 12:52 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 1 sibling, 1 reply; 13+ messages in thread From: SERHAD ERDEM @ 2025-05-05 12:38 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; Achilleas Mantzios <[email protected]>; +Cc: [email protected] <[email protected]> Hi , you had better try vacuum analyze for the whole db , pgbouncer connection layer can not causeslow queries. ________________________________ From: Mladen Marinović <[email protected]> Sent: Monday, May 5, 2025 12:27 PM To: Achilleas Mantzios <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: Different execution plans in PG17 and pgBouncer... On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <[email protected]<mailto:[email protected]>> wrote: On 5/5/25 11:00, Mladen Marinović wrote: On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <[email protected]<mailto:[email protected]>> wrote: On 5/5/25 09:52, Mladen Marinović wrote: Hi, We recently migrated our production instances from PG11 to PG17. While doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on the test servers we pushed this to production a few weeks ago. We did not notice any problems until a few days ago (but the problems were here from the start). The main manifestation of the problems is a service that runs a fixed query to get a backlog of unprocessed data (limited to a 1000 rows). When testing the query using pgAdmin connected directly to the database we get a result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer to connect to the same database. That's a huge jump, I hope you guys did extensive testing of your app. In which language is your app written? If java, then define prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in pgbouncer. Mainly python, but the problem was noticed in a java service. Prepare treshold was already set to 0. We changed the max_prepared_statements to 0 from the default (200) but no change was noticed. How about search paths ? any difference on those between the two runs ? Do you set search_path in pgbouncer ? what is "cca." btw ? The more interesting part is that when we issue an explain of the same query we get different plans. We did this a few seconds apart so there should be no difference in collected statistics. We ruled out prepared statements, as we suspected the generic plan might be the problem, but it is not. Is there any pgBouncer or PG17 parameter that might be the cause of this? Does this spawn any connections (such as dblink) ? are there limits per user/db pool_size in pgbouncer ? No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses a different plan and runs slower because of that. Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just passes through SQL to the server as fast as possible as it can. But I am sure you know that. Good luck, keep us posted! Yes, that is what puzzles me. What is the pgbouncer's timeout in the server connections ? How about "idle in transaction" ? do you get any of those? What's the isolation level ? How about the user ? is this the same user doing pgadmin queries VS via the app ? Can you identify the user under which the problem is manifested and : ALTER user "unlucky_user" SET log_statement = 'all'; ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you debug the prepared statements .. just in case , and other stuff not printed by log_statement = all. None of those parameters should affect the fact that when issuing the explain select query (the statement is not prepared) from psql directly gives a different result than issuing it over the pgbouncer connection. The result is repeatable. We have rolled back pgbouncer to 1.12. and it seems the problem persists. This is one of the weirdest things I have ever seen with PostgreSQL. Regards, Mladen Marinović ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:38 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> @ 2025-05-05 12:52 ` Mladen Marinović <[email protected]> 2025-05-05 13:07 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 2025-05-05 13:09 ` Re: Different execution plans in PG17 and pgBouncer... Efrain J. Berdecia <[email protected]> 0 siblings, 2 replies; 13+ messages in thread From: Mladen Marinović @ 2025-05-05 12:52 UTC (permalink / raw) To: SERHAD ERDEM <[email protected]>; +Cc: Achilleas Mantzios <[email protected]>; [email protected] <[email protected]> On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM <[email protected]> wrote: > Hi , you had better try vacuum analyze for the whole db , pgbouncer > connection layer can not causeslow queries. > I did that already. But the slow query is the consequence of the different plan, not the statistics. > ------------------------------ > *From:* Mladen Marinović <[email protected]> > *Sent:* Monday, May 5, 2025 12:27 PM > *To:* Achilleas Mantzios <[email protected]> > *Cc:* [email protected] < > [email protected]> > *Subject:* Re: Different execution plans in PG17 and pgBouncer... > > > > On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios < > [email protected]> wrote: > > > On 5/5/25 11:00, Mladen Marinović wrote: > > > > On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios < > [email protected]> wrote: > > > On 5/5/25 09:52, Mladen Marinović wrote: > > Hi, > > We recently migrated our production instances from PG11 to PG17. While > doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As > everything worked on the test servers we pushed this to production a few > weeks ago. We did not notice any problems until a few days ago (but the > problems were here from the start). The main manifestation of the problems > is a service that runs a fixed query to get a backlog of unprocessed data > (limited to a 1000 rows). When testing the query using pgAdmin connected > directly to the database we get a result in cca. 20 seconds. The same query > runs for 2 hours when using pgBouncer to connect to the same database. > > > That's a huge jump, I hope you guys did extensive testing of your app. In > which language is your app written? If java, then define prepareThreshold=0 > in your jdbc and set max_prepared_statements = 0 in pgbouncer. > > Mainly python, but the problem was noticed in a java service. > Prepare treshold was already set to 0. We changed the max_prepared_statements > to 0 from the default (200) but no change was noticed. > > How about search paths ? any difference on those between the two runs ? Do > you set search_path in pgbouncer ? what is "cca." btw ? > > > The more interesting part is that when we issue an explain of the same > query we get different plans. We did this a few seconds apart so there > should be no difference in collected statistics. We ruled out prepared > statements, as we suspected the generic plan might be the problem, but it > is not. Is there any pgBouncer or PG17 parameter that might be the cause of > this? > > > Does this spawn any connections (such as dblink) ? are there limits per > user/db pool_size in pgbouncer ? > > No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, > UPDATE, DELETE,...) There are limits, but they are not hit. The query just > uses a different plan and runs slower because of that. > > Pgbouncer, in contrast to its old friend PgPool-II is completely passive, > just passes through SQL to the server as fast as possible as it can. But I > am sure you know that. Good luck, keep us posted! > > Yes, that is what puzzles me. > > What is the pgbouncer's timeout in the server connections ? > > How about "idle in transaction" ? do you get any of those? What's the > isolation level ? > > How about the user ? is this the same user doing pgadmin queries VS via > the app ? > > Can you identify the user under which the problem is manifested and : > > ALTER user "unlucky_user" SET log_statement = 'all'; > > ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help > you debug the prepared statements .. just in case , and other stuff not > printed by log_statement = all. > > None of those parameters should affect the fact that when issuing the > explain select query (the statement is not prepared) from psql directly > gives a different result than issuing it over the pgbouncer connection. The > result is repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the problem persists. > This is one of the weirdest things I have ever seen with PostgreSQL. > > > Regards, > Mladen Marinović > > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:38 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 2025-05-05 12:52 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> @ 2025-05-05 13:07 ` SERHAD ERDEM <[email protected]> 1 sibling, 0 replies; 13+ messages in thread From: SERHAD ERDEM @ 2025-05-05 13:07 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; +Cc: Achilleas Mantzios <[email protected]>; [email protected] <[email protected]> Hi, if you are sure that exac plans are the same , try the model of select count (*) from type , instead of select * from limit; you may understand that the problem it is due to returning rows or not. ________________________________ From: Mladen Marinović <[email protected]> Sent: Monday, May 5, 2025 12:52 PM To: SERHAD ERDEM <[email protected]> Cc: Achilleas Mantzios <[email protected]>; [email protected] <[email protected]> Subject: Re: Different execution plans in PG17 and pgBouncer... On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM <[email protected]<mailto:[email protected]>> wrote: Hi , you had better try vacuum analyze for the whole db , pgbouncer connection layer can not causeslow queries. I did that already. But the slow query is the consequence of the different plan, not the statistics. ________________________________ From: Mladen Marinović <[email protected]<mailto:[email protected]>> Sent: Monday, May 5, 2025 12:27 PM To: Achilleas Mantzios <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Re: Different execution plans in PG17 and pgBouncer... On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <[email protected]<mailto:[email protected]>> wrote: On 5/5/25 11:00, Mladen Marinović wrote: On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <[email protected]<mailto:[email protected]>> wrote: On 5/5/25 09:52, Mladen Marinović wrote: Hi, We recently migrated our production instances from PG11 to PG17. While doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on the test servers we pushed this to production a few weeks ago. We did not notice any problems until a few days ago (but the problems were here from the start). The main manifestation of the problems is a service that runs a fixed query to get a backlog of unprocessed data (limited to a 1000 rows). When testing the query using pgAdmin connected directly to the database we get a result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer to connect to the same database. That's a huge jump, I hope you guys did extensive testing of your app. In which language is your app written? If java, then define prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in pgbouncer. Mainly python, but the problem was noticed in a java service. Prepare treshold was already set to 0. We changed the max_prepared_statements to 0 from the default (200) but no change was noticed. How about search paths ? any difference on those between the two runs ? Do you set search_path in pgbouncer ? what is "cca." btw ? The more interesting part is that when we issue an explain of the same query we get different plans. We did this a few seconds apart so there should be no difference in collected statistics. We ruled out prepared statements, as we suspected the generic plan might be the problem, but it is not. Is there any pgBouncer or PG17 parameter that might be the cause of this? Does this spawn any connections (such as dblink) ? are there limits per user/db pool_size in pgbouncer ? No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses a different plan and runs slower because of that. Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just passes through SQL to the server as fast as possible as it can. But I am sure you know that. Good luck, keep us posted! Yes, that is what puzzles me. What is the pgbouncer's timeout in the server connections ? How about "idle in transaction" ? do you get any of those? What's the isolation level ? How about the user ? is this the same user doing pgadmin queries VS via the app ? Can you identify the user under which the problem is manifested and : ALTER user "unlucky_user" SET log_statement = 'all'; ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you debug the prepared statements .. just in case , and other stuff not printed by log_statement = all. None of those parameters should affect the fact that when issuing the explain select query (the statement is not prepared) from psql directly gives a different result than issuing it over the pgbouncer connection. The result is repeatable. We have rolled back pgbouncer to 1.12. and it seems the problem persists. This is one of the weirdest things I have ever seen with PostgreSQL. Regards, Mladen Marinović ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:38 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 2025-05-05 12:52 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> @ 2025-05-05 13:09 ` Efrain J. Berdecia <[email protected]> 2025-05-05 13:26 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 1 sibling, 1 reply; 13+ messages in thread From: Efrain J. Berdecia @ 2025-05-05 13:09 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; SERHAD ERDEM <[email protected]>; +Cc: Achilleas Mantzios <[email protected]>; [email protected] <[email protected]> Is the query using parameter markers? Is the source executing the query forcing a "bad" data type casting? Yahoo Mail: Search, Organize, Conquer On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović<[email protected]> wrote: On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM <[email protected]> wrote: Hi , you had better try vacuum analyze for the whole db , pgbouncer connection layer can not causeslow queries. I did that already. But the slow query is the consequence of the different plan, not the statistics. From: Mladen Marinović <[email protected]> Sent: Monday, May 5, 2025 12:27 PM To: Achilleas Mantzios <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: Different execution plans in PG17 and pgBouncer... On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <[email protected]> wrote: On 5/5/25 11:00, Mladen Marinović wrote: On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <[email protected]> wrote: On 5/5/25 09:52, Mladen Marinović wrote: Hi, We recently migrated our production instances from PG11 to PG17. While doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on the test servers we pushed this to production a few weeks ago. We did not notice any problems until a few days ago (but the problems were here from the start). The main manifestation of the problems is a service that runs a fixed query to get a backlog of unprocessed data (limited to a 1000 rows). When testing the query using pgAdmin connected directly to the database we get a result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer to connect to the same database. That's a huge jump, I hope you guys did extensive testing of your app. In which language is your app written? If java, then define prepareThreshold=0 in your jdbc and setmax_prepared_statements = 0in pgbouncer. Mainly python, but the problem was noticed in a java service.Prepare treshold was already set to 0. We changed the max_prepared_statementsto 0 from the default (200) but no change was noticed. How about search paths ? any difference on those between the two runs ? Do you set search_path in pgbouncer ? what is "cca." btw ? The more interesting part is that when we issue an explain of the same query we get different plans. We did this a few seconds apart so there should be no difference in collected statistics. We ruled out prepared statements, as we suspected the generic plan might be the problem, but it is not. Is there any pgBouncer or PG17 parameter that might be the cause of this? Does this spawn any connections (such as dblink) ? are there limits per user/db pool_size in pgbouncer ? No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses a different plan and runs slower because of that. Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just passes through SQL to the server as fast as possible as it can. But I am sure you know that. Good luck, keep us posted! Yes, that is what puzzles me. What is the pgbouncer's timeout in the server connections ? How about "idle in transaction" ? do you get any of those? What's the isolation level ? How about the user ? is this the same user doing pgadmin queries VS via the app ? Can you identify the user under which the problem is manifested and : ALTER user "unlucky_user" SET log_statement = 'all'; ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you debug the prepared statements .. just in case , and other stuff not printed by log_statement = all. None of those parameters should affect the fact that when issuing the explain select query (the statement is not prepared) from psql directly gives a different result than issuing it over the pgbouncer connection. The result is repeatable. We have rolled back pgbouncer to 1.12. and it seems the problem persists. This is one of the weirdest things I have ever seen with PostgreSQL. Regards,Mladen Marinović ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:38 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 2025-05-05 12:52 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 13:09 ` Re: Different execution plans in PG17 and pgBouncer... Efrain J. Berdecia <[email protected]> @ 2025-05-05 13:26 ` Mladen Marinović <[email protected]> 2025-05-05 13:36 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 2025-05-05 14:09 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 0 siblings, 2 replies; 13+ messages in thread From: Mladen Marinović @ 2025-05-05 13:26 UTC (permalink / raw) To: Efrain J. Berdecia <[email protected]>; +Cc: SERHAD ERDEM <[email protected]>; Achilleas Mantzios <[email protected]>; [email protected] <[email protected]> Hi, Mystery not solved...but identified. The pool is in transaction mode and some connections use set enable_mergejoin=off, but they do not set it back to on. Upon getting the connection from the pool the parameter is still set to off causing the planner to not use this kind of join which results in different plans when using this tainted pgbouncer connection instead of the clean one from pg17. The problem is that server_reset_query is not used when the pool is in transaction mode. Now, we have to see how to fix this problem. Regards, Mladen Marinović On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia <[email protected]> wrote: > Is the query using parameter markers? Is the source executing the query > forcing a "bad" data type casting? > > Yahoo Mail: Search, Organize, Conquer > <https://mail.onelink.me/107872968?pid=nativeplacement&c=US_Acquisition_YMktg_315_SearchOrgConque...; > > On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović > <[email protected]> wrote: > > > On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM <[email protected]> wrote: > > Hi , you had better try vacuum analyze for the whole db , pgbouncer > connection layer can not causeslow queries. > > > I did that already. But the slow query is the consequence of the different > plan, not the statistics. > > > ------------------------------ > *From:* Mladen Marinović <[email protected]> > *Sent:* Monday, May 5, 2025 12:27 PM > *To:* Achilleas Mantzios <[email protected]> > *Cc:* [email protected] < > [email protected]> > *Subject:* Re: Different execution plans in PG17 and pgBouncer... > > > > On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios < > [email protected]> wrote: > > > On 5/5/25 11:00, Mladen Marinović wrote: > > > > On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios < > [email protected]> wrote: > > > On 5/5/25 09:52, Mladen Marinović wrote: > > Hi, > > We recently migrated our production instances from PG11 to PG17. While > doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As > everything worked on the test servers we pushed this to production a few > weeks ago. We did not notice any problems until a few days ago (but the > problems were here from the start). The main manifestation of the problems > is a service that runs a fixed query to get a backlog of unprocessed data > (limited to a 1000 rows). When testing the query using pgAdmin connected > directly to the database we get a result in cca. 20 seconds. The same query > runs for 2 hours when using pgBouncer to connect to the same database. > > > That's a huge jump, I hope you guys did extensive testing of your app. In > which language is your app written? If java, then define prepareThreshold=0 > in your jdbc and set max_prepared_statements = 0 in pgbouncer. > > Mainly python, but the problem was noticed in a java service. > Prepare treshold was already set to 0. We changed the max_prepared_statements > to 0 from the default (200) but no change was noticed. > > How about search paths ? any difference on those between the two runs ? Do > you set search_path in pgbouncer ? what is "cca." btw ? > > > The more interesting part is that when we issue an explain of the same > query we get different plans. We did this a few seconds apart so there > should be no difference in collected statistics. We ruled out prepared > statements, as we suspected the generic plan might be the problem, but it > is not. Is there any pgBouncer or PG17 parameter that might be the cause of > this? > > > Does this spawn any connections (such as dblink) ? are there limits per > user/db pool_size in pgbouncer ? > > No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, > UPDATE, DELETE,...) There are limits, but they are not hit. The query just > uses a different plan and runs slower because of that. > > Pgbouncer, in contrast to its old friend PgPool-II is completely passive, > just passes through SQL to the server as fast as possible as it can. But I > am sure you know that. Good luck, keep us posted! > > Yes, that is what puzzles me. > > What is the pgbouncer's timeout in the server connections ? > > How about "idle in transaction" ? do you get any of those? What's the > isolation level ? > > How about the user ? is this the same user doing pgadmin queries VS via > the app ? > > Can you identify the user under which the problem is manifested and : > > ALTER user "unlucky_user" SET log_statement = 'all'; > > ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help > you debug the prepared statements .. just in case , and other stuff not > printed by log_statement = all. > > None of those parameters should affect the fact that when issuing the > explain select query (the statement is not prepared) from psql directly > gives a different result than issuing it over the pgbouncer connection. The > result is repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the problem persists. > This is one of the weirdest things I have ever seen with PostgreSQL. > > > Regards, > Mladen Marinović > > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:38 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 2025-05-05 12:52 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 13:09 ` Re: Different execution plans in PG17 and pgBouncer... Efrain J. Berdecia <[email protected]> 2025-05-05 13:26 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> @ 2025-05-05 13:36 ` SERHAD ERDEM <[email protected]> 1 sibling, 0 replies; 13+ messages in thread From: SERHAD ERDEM @ 2025-05-05 13:36 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; Efrain J. Berdecia <[email protected]>; +Cc: Achilleas Mantzios <[email protected]>; [email protected] <[email protected]> Hi , wish you good lock with the "transaction mode" 🙂 if pgbouncer is not really needed , remove and use plain connections. i have experienced pgbouncer in session mode over 2 years with situation like "pain in the ass" , finaly removed this bouncing layer. ________________________________ From: Mladen Marinović <[email protected]> Sent: Monday, May 5, 2025 1:26 PM To: Efrain J. Berdecia <[email protected]> Cc: SERHAD ERDEM <[email protected]>; Achilleas Mantzios <[email protected]>; [email protected] <[email protected]> Subject: Re: Different execution plans in PG17 and pgBouncer... Hi, Mystery not solved...but identified. The pool is in transaction mode and some connections use set enable_mergejoin=off, but they do not set it back to on. Upon getting the connection from the pool the parameter is still set to off causing the planner to not use this kind of join which results in different plans when using this tainted pgbouncer connection instead of the clean one from pg17. The problem is that server_reset_query is not used when the pool is in transaction mode. Now, we have to see how to fix this problem. Regards, Mladen Marinović On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia <[email protected]<mailto:[email protected]>> wrote: Is the query using parameter markers? Is the source executing the query forcing a "bad" data type casting? Yahoo Mail: Search, Organize, Conquer<https://mail.onelink.me/107872968?pid=nativeplacement&c=US_Acquisition_YMktg_315_SearchOrgConque...; On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović <[email protected]<mailto:[email protected]>> wrote: On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM <[email protected]<mailto:[email protected]>> wrote: Hi , you had better try vacuum analyze for the whole db , pgbouncer connection layer can not causeslow queries. I did that already. But the slow query is the consequence of the different plan, not the statistics. ________________________________ From: Mladen Marinović <[email protected]<mailto:[email protected]>> Sent: Monday, May 5, 2025 12:27 PM To: Achilleas Mantzios <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Re: Different execution plans in PG17 and pgBouncer... On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <[email protected]<mailto:[email protected]>> wrote: On 5/5/25 11:00, Mladen Marinović wrote: On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <[email protected]<mailto:[email protected]>> wrote: On 5/5/25 09:52, Mladen Marinović wrote: Hi, We recently migrated our production instances from PG11 to PG17. While doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on the test servers we pushed this to production a few weeks ago. We did not notice any problems until a few days ago (but the problems were here from the start). The main manifestation of the problems is a service that runs a fixed query to get a backlog of unprocessed data (limited to a 1000 rows). When testing the query using pgAdmin connected directly to the database we get a result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer to connect to the same database. That's a huge jump, I hope you guys did extensive testing of your app. In which language is your app written? If java, then define prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in pgbouncer. Mainly python, but the problem was noticed in a java service. Prepare treshold was already set to 0. We changed the max_prepared_statements to 0 from the default (200) but no change was noticed. How about search paths ? any difference on those between the two runs ? Do you set search_path in pgbouncer ? what is "cca." btw ? The more interesting part is that when we issue an explain of the same query we get different plans. We did this a few seconds apart so there should be no difference in collected statistics. We ruled out prepared statements, as we suspected the generic plan might be the problem, but it is not. Is there any pgBouncer or PG17 parameter that might be the cause of this? Does this spawn any connections (such as dblink) ? are there limits per user/db pool_size in pgbouncer ? No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses a different plan and runs slower because of that. Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just passes through SQL to the server as fast as possible as it can. But I am sure you know that. Good luck, keep us posted! Yes, that is what puzzles me. What is the pgbouncer's timeout in the server connections ? How about "idle in transaction" ? do you get any of those? What's the isolation level ? How about the user ? is this the same user doing pgadmin queries VS via the app ? Can you identify the user under which the problem is manifested and : ALTER user "unlucky_user" SET log_statement = 'all'; ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you debug the prepared statements .. just in case , and other stuff not printed by log_statement = all. None of those parameters should affect the fact that when issuing the explain select query (the statement is not prepared) from psql directly gives a different result than issuing it over the pgbouncer connection. The result is repeatable. We have rolled back pgbouncer to 1.12. and it seems the problem persists. This is one of the weirdest things I have ever seen with PostgreSQL. Regards, Mladen Marinović ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 12:38 ` Re: Different execution plans in PG17 and pgBouncer... SERHAD ERDEM <[email protected]> 2025-05-05 12:52 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> 2025-05-05 13:09 ` Re: Different execution plans in PG17 and pgBouncer... Efrain J. Berdecia <[email protected]> 2025-05-05 13:26 ` Re: Different execution plans in PG17 and pgBouncer... Mladen Marinović <[email protected]> @ 2025-05-05 14:09 ` Achilleas Mantzios <[email protected]> 1 sibling, 0 replies; 13+ messages in thread From: Achilleas Mantzios @ 2025-05-05 14:09 UTC (permalink / raw) To: Mladen Marinović <[email protected]>; Efrain J. Berdecia <[email protected]>; +Cc: SERHAD ERDEM <[email protected]>; [email protected] <[email protected]> On 5/5/25 14:26, Mladen Marinović wrote: > Hi, > > Mystery not solved...but identified. The pool is in transaction mode > and some connections use set enable_mergejoin=off, but they do not set > it back to on. Upon getting the connection from the pool the parameter > is still set to off causing the planner to not use this kind of join > which results in different plans when using this tainted pgbouncer > connection instead of the clean one from pg17. > > The problem is that server_reset_query is not used when the pool is in > transaction mode. Now, we have to see how to fix this problem. But you've got this : https://www.pgbouncer.org/config.html " server_reset_query_always Whether |server_reset_query| should be run in all pooling modes. When this setting is off (default), the |server_reset_query| will be run only in pools that are in sessions-pooling mode. Connections in transaction-pooling mode should not have any need for a reset query. This setting is for working around broken setups that run applications that use session features over a transaction-pooled PgBouncer. It changes non-deterministic breakage to deterministic breakage: Clients always lose their state after each transaction. " > > Regards, > Mladen Marinović > > On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia > <[email protected]> wrote: > > Is the query using parameter markers? Is the source executing the > query forcing a "bad" data type casting? > > Yahoo Mail: Search, Organize, Conquer > <https://mail.onelink.me/107872968?pid=nativeplacement&c=US_Acquisition_YMktg_315_SearchOrgConque...; > > On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović > <[email protected]> wrote: > > > On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM > <[email protected]> wrote: > > Hi , you had better try vacuum analyze for the whole db > , pgbouncer connection layer can not causeslow queries. > > > I did that already. But the slow query is the consequence of > the different plan, not the statistics. > > ------------------------------------------------------------------------ > *From:* Mladen Marinović <[email protected]> > *Sent:* Monday, May 5, 2025 12:27 PM > *To:* Achilleas Mantzios <[email protected]> > *Cc:* [email protected] > <[email protected]> > *Subject:* Re: Different execution plans in PG17 and > pgBouncer... > > > On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios > <[email protected]> wrote: > > > On 5/5/25 11:00, Mladen Marinović wrote: > > > > On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios > <[email protected]> wrote: > > > On 5/5/25 09:52, Mladen Marinović wrote: > > Hi, > > We recently migrated our production > instances from PG11 to PG17. While doing > so we upgraded our pgBouncer instances > from 1.12 to 1.24. As everything worked on > the test servers we pushed this to > production a few weeks ago. We did not > notice any problems until a few days ago > (but the problems were here from the > start). The main manifestation of the > problems is a service that runs a fixed > query to get a backlog of unprocessed data > (limited to a 1000 rows). When testing the > query using pgAdmin connected directly to > the database we get a result in cca. 20 > seconds. The same query runs for 2 hours > when using pgBouncer to connect to the > same database. > > > That's a huge jump, I hope you guys did > extensive testing of your app. In which > language is your app written? If java, then > define prepareThreshold=0 in your jdbc and set > max_prepared_statements = 0 in pgbouncer. > > Mainly python, but the problem was noticed in a > java service. > Prepare treshold was already set to 0. We changed > the max_prepared_statements to 0 from the default > (200) but no change was noticed. > > How about search paths ? any difference on > those between the two runs ? Do you set > search_path in pgbouncer ? what is "cca." btw ? > > > The more interesting part is that when we > issue an explain of the same query we get > different plans. We did this a few seconds > apart so there should be no difference in > collected statistics. We ruled out > prepared statements, as we suspected the > generic plan might be the problem, but it > is not. Is there any pgBouncer or PG17 > parameter that might be the cause of this? > > > Does this spawn any connections (such as > dblink) ? are there limits per user/db > pool_size in pgbouncer ? > > No additional connection nor dbling. Just plain > SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...) > There are limits, but they are not hit. The query > just uses a different plan and runs slower because > of that. > > Pgbouncer, in contrast to its old friend > PgPool-II is completely passive, just passes > through SQL to the server as fast as possible > as it can. But I am sure you know that. Good > luck, keep us posted! > > Yes, that is what puzzles me. > > What is the pgbouncer's timeout in the server > connections ? > > How about "idle in transaction" ? do you get any of > those? What's the isolation level ? > > How about the user ? is this the same user doing > pgadmin queries VS via the app ? > > Can you identify the user under which the problem is > manifested and : > > ALTER user "unlucky_user" SET log_statement = 'all'; > > ALTER user "unlucky_user" SET > log_min_duration_statement = 0; -- to help you debug > the prepared statements .. just in case , and other > stuff not printed by log_statement = all. > > None of those parameters should affect the fact that when > issuing the explain select query (the statement is not > prepared) from psql directly gives a different result than > issuing it over the pgbouncer connection. The result is > repeatable. > > We have rolled back pgbouncer to 1.12. and it seems the > problem persists. This is one of the weirdest things I > have ever seen with PostgreSQL. > > Regards, > Mladen Marinović > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Different execution plans in PG17 and pgBouncer... 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> @ 2025-05-05 12:29 ` Achilleas Mantzios <[email protected]> 1 sibling, 0 replies; 13+ messages in thread From: Achilleas Mantzios @ 2025-05-05 12:29 UTC (permalink / raw) To: Ruben Morais <[email protected]>; [email protected] <[email protected]> On 5/5/25 11:30, Ruben Morais wrote: > HI, > > Could be a hint but test with jit to off. > If not wrong as you change from 11 to 17, that could be a cause, just > try it because in some cases plans changed when jit is on. > Not only JIT but also other extensions (such as timescale) could greatly affect the plan. He could find if any GUC are set for the particular user : select * from pg_db_role_setting where setrole = to_regrole('<unlucky_user>'); > Regards, > *Rúben Morais* > > > > On Mon, May 5, 2025 at 11:07 AM Achilleas Mantzios > <[email protected]> wrote: > > > On 5/5/25 11:00, Mladen Marinović wrote: >> >> >> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios >> <[email protected]> wrote: >> >> >> On 5/5/25 09:52, Mladen Marinović wrote: >>> Hi, >>> >>> We recently migrated our production instances from PG11 to >>> PG17. While doing so we upgraded our pgBouncer instances >>> from 1.12 to 1.24. As everything worked on the test servers >>> we pushed this to production a few weeks ago. We did not >>> notice any problems until a few days ago (but the problems >>> were here from the start). The main manifestation of the >>> problems is a service that runs a fixed query to get a >>> backlog of unprocessed data (limited to a 1000 rows). When >>> testing the query using pgAdmin connected directly to the >>> database we get a result in cca. 20 seconds. The same query >>> runs for 2 hours when using pgBouncer to connect to the same >>> database. >> >> >> That's a huge jump, I hope you guys did extensive testing of >> your app. In which language is your app written? If java, >> then define prepareThreshold=0 in your jdbc and set >> max_prepared_statements = 0 in pgbouncer. >> >> Mainly python, but the problem was noticed in a java service. >> Prepare treshold was already set to 0. We changed the >> max_prepared_statements to 0 from the default (200) but no change >> was noticed. >> >> How about search paths ? any difference on those between the >> two runs ? Do you set search_path in pgbouncer ? what is >> "cca." btw ? >> >>> >>> The more interesting part is that when we issue an explain >>> of the same query we get different plans. We did this a few >>> seconds apart so there should be no difference in collected >>> statistics. We ruled out prepared statements, as we >>> suspected the generic plan might be the problem, but it is >>> not. Is there any pgBouncer or PG17 parameter that might be >>> the cause of this? >> >> >> Does this spawn any connections (such as dblink) ? are there >> limits per user/db pool_size in pgbouncer ? >> >> No additional connection nor dbling. Just plain SQL (CTE, SELECT, >> INSERT, UPDATE, DELETE,...) There are limits, but they are not >> hit. The query just uses a different plan and runs slower because >> of that. >> >> Pgbouncer, in contrast to its old friend PgPool-II is >> completely passive, just passes through SQL to the server as >> fast as possible as it can. But I am sure you know that. Good >> luck, keep us posted! >> >> Yes, that is what puzzles me. > > What is the pgbouncer's timeout in the server connections ? > > How about "idle in transaction" ? do you get any of those? What's > the isolation level ? > > How about the user ? is this the same user doing pgadmin queries > VS via the app ? > > Can you identify the user under which the problem is manifested and : > > ALTER user "unlucky_user" SET log_statement = 'all'; > > ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- > to help you debug the prepared statements .. just in case , and > other stuff not printed by log_statement = all. > >> Regards, >> Mladen Marinović > ^ permalink raw reply [nested|flat] 13+ messages in thread
end of thread, other threads:[~2025-05-05 14:09 UTC | newest] Thread overview: 13+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-05-05 09:24 Re: Different execution plans in PG17 and pgBouncer... Achilleas Mantzios <[email protected]> 2025-05-05 10:07 ` Achilleas Mantzios <[email protected]> 2025-05-05 12:27 ` Mladen Marinović <[email protected]> 2025-05-05 12:36 ` Achilleas Mantzios <[email protected]> 2025-05-05 12:51 ` Mladen Marinović <[email protected]> 2025-05-05 12:38 ` SERHAD ERDEM <[email protected]> 2025-05-05 12:52 ` Mladen Marinović <[email protected]> 2025-05-05 13:07 ` SERHAD ERDEM <[email protected]> 2025-05-05 13:09 ` Efrain J. Berdecia <[email protected]> 2025-05-05 13:26 ` Mladen Marinović <[email protected]> 2025-05-05 13:36 ` SERHAD ERDEM <[email protected]> 2025-05-05 14:09 ` Achilleas Mantzios <[email protected]> 2025-05-05 12:29 ` Achilleas Mantzios <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox