Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?

public inbox for [email protected]  
help / color / mirror / Atom feed

Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?
3+ messages / 2 participants
[nested] [flat]

* Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?
@ 2025-07-16 12:13  TV <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: TV @ 2025-07-16 12:13 UTC (permalink / raw)
  To: [email protected]

Just to give a bit of background, we've recently migrated from old setup to
new physical servers, and are running Ubuntu24 and latest (4.6.2) version
of pgpool.  The migration went fairly well, but we are noticing that the
performance isn't any better than on the old servers, frankly it seems...
worse.  I was wondering if some of the pgpool pros could look over our
config and perhaps recommend some changes/tuning?  Hardware-wise, it's
pretty beefy, we got 1TB of RAM to play with, 80 cores (2 processors with
20 physical cores and 40 virtual), hardware definitely doesn't seem to be a
problem.   Some 'highlights' from pgpool.conf, feel free to ask for other
settings if they'll help to clear up the picture:

num_init_children = 3500
max_pool = 1
child_life_time = 0
child_max_connections = 0
connection_life_time = 500
client_idle_limit = 600
process_management_mode = dynamic
process_management_strategy = gentle
min_spare_children = 50
max_spare_children = 100
connection_cache = on
load_balance_mode = on
disable_load_balance_on_write = 'transaction'
statement_level_load_balance = on

This is a 4 node cluster running PG13 and backend_weight is set to 1 for
all 4 nodes.

Some of the errors we are seeing in pgpool logs:
2025-07-15 10:57:32: pid 2629089: CONTEXT:  while checking replication time
lag
2025-07-15 10:57:32: pid 2629089: LOCATION:  pool_worker_child.c:644
2025-07-15 10:57:33: pid 3892376: LOG:  Error message from backend: DB node
id: 2 message: "canceling statement due to conflict with recovery"
2025-07-15 10:57:33: pid 3892376: LOCATION:  pool_proto_modules.c:3226
2025-07-15 10:57:33: pid 3892376: FATAL:  unable to read data from DB node 2
2025-07-15 10:57:33: pid 3892376: DETAIL:  EOF encountered with backend
2025-07-15 10:57:33: pid 3892376: LOCATION:  pool_stream.c:274
2025-07-15 10:57:33: pid 2629004: LOG:  child process with pid: 3892376
exited with success and will not be restarted
2025-07-15 10:57:33: pid 2629004: LOCATION:  pgpool_main.c:2059

Also this:
2025-07-15 11:02:22: pid 3892505: ERROR:  unable to read data from DB node 2
2025-07-15 11:02:22: pid 3892505: DETAIL:  do not failover because
failover_on_backend_error is off
2025-07-15 11:02:22: pid 3892505: LOCATION:  pool_stream.c:407
2025-07-15 11:02:22: pid 3892505: WARNING:  write on backend 2 failed with
error :"Broken pipe"
2025-07-15 11:02:22: pid 3892505: DETAIL:  while trying to write data from
offset: 0 wlen: 17
2025-07-15 11:02:22: pid 3892505: LOCATION:  pool_stream.c:714
2025-07-15 11:02:22: pid 3892505: WARNING:  write on backend 2 failed with
error :"Broken pipe"
2025-07-15 11:02:22: pid 3892505: DETAIL:  while trying to write data from
offset: 0 wlen: 5
2025-07-15 11:02:22: pid 3892505: LOCATION:  pool_stream.c:714


saw this is as well:
2025-07-15 11:05:12: pid 2629089: CONTEXT:  while checking replication time
lag
2025-07-15 11:05:12: pid 2629089: LOCATION:  pool_worker_child.c:644
2025-07-15 11:05:19: pid 3891928: ERROR:  unable to read data from frontend
2025-07-15 11:05:19: pid 3891928: DETAIL:  socket read function returned -1
2025-07-15 11:05:19: pid 3891928: LOCATION:  pool_stream.c:414
2025-07-15 11:05:19: pid 3891928: LOG:  pool_send_and_wait: Error or notice
message from backend: DB node id: 1 backend pid: 3938180 statement: "ABORT"
message:
"terminating connection due to conflict with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION:  pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG:  pool_send_and_wait: Error or notice
message from backend: DB node id: 2 backend pid: 3929256 statement: "ABORT"
message:
"terminating connection due to conflict with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION:  pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG:  pool_send_and_wait: Error or notice
message from backend: DB node id: 3 backend pid: 3929098 statement: "ABORT"
message:
"terminating connection due to conflict with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION:  pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG:  pool_send_and_wait: Error or notice
message from backend: DB node id: 0 backend pid: 3060000 statement: "ABORT"
message:
"terminating connection due to idle-in-transaction timeout"
2025-07-15 11:05:19: pid 3891928: LOCATION:  pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: WARNING:  write on backend 1 failed with
error :"Broken pipe"
2025-07-15 11:05:19: pid 3891928: DETAIL:  while trying to write data from
offset: 0 wlen: 5

Some of these generally seem to suggest connectivity problems?  Anything
you can suggest to look into?   It's also worth noting that if we bypass
the pgpool VIP and connect the applications directly to the DB master node,
there are no problems reported so it sure does seem like something with our
pgpool setup...

Any help will be much recommended.


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?
@ 2025-07-16 12:32  Achilleas Mantzios <[email protected]>
  parent: TV <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Achilleas Mantzios @ 2025-07-16 12:32 UTC (permalink / raw)
  To: [email protected]

On 7/16/25 13:13, TV wrote:

> [snip]
>
> num_init_children = 3500
> max_pool = 1
> child_life_time = 0
> child_max_connections = 0
> connection_life_time = 500
> client_idle_limit = 600
> process_management_mode = dynamic
> process_management_strategy = gentle
> min_spare_children = 50
> max_spare_children = 100
> connection_cache = on
> load_balance_mode = on
> disable_load_balance_on_write = 'transaction'
> statement_level_load_balance = on
>
What's your backend_clustering_mode ?

> [snip]
>

^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?
@ 2025-07-16 12:58  TV <[email protected]>
  parent: Achilleas Mantzios <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: TV @ 2025-07-16 12:58 UTC (permalink / raw)
  To: [email protected]

backend_clustering_mode = 'streaming_replication'

On Wed, Jul 16, 2025 at 2:32 PM Achilleas Mantzios <
[email protected]> wrote:

> On 7/16/25 13:13, TV wrote:
>
> [snip]
>
> num_init_children = 3500
> max_pool = 1
> child_life_time = 0
> child_max_connections = 0
> connection_life_time = 500
> client_idle_limit = 600
> process_management_mode = dynamic
> process_management_strategy = gentle
> min_spare_children = 50
> max_spare_children = 100
> connection_cache = on
> load_balance_mode = on
> disable_load_balance_on_write = 'transaction'
> statement_level_load_balance = on
>
> What's your backend_clustering_mode ?
>
> [snip]
>
>
>

^ permalink  raw  reply  [nested|flat] 3+ messages in thread

end of thread, other threads:[~2025-07-16 12:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-07-16 12:13 Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions? TV <[email protected]>
2025-07-16 12:32 ` Achilleas Mantzios <[email protected]>
2025-07-16 12:58   ` TV <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox