Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uc11T-007z75-4i for pgpool-general@arkaria.postgresql.org; Wed, 16 Jul 2025 12:14:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uc11R-009uy7-7F for pgpool-general@arkaria.postgresql.org; Wed, 16 Jul 2025 12:14:05 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uc11Q-009uxo-T0 for pgpool-general@lists.postgresql.org; Wed, 16 Jul 2025 12:14:05 +0000 Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uc11P-007Znh-0t for pgpool-general@lists.postgresql.org; Wed, 16 Jul 2025 12:14:04 +0000 Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-3a54700a46eso4112144f8f.1 for ; Wed, 16 Jul 2025 05:14:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752668040; x=1753272840; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=HaE/uvmpjXwkT+32+z27ndNj6LA8nziCy1bugpOmhKs=; b=BT/+L/Ap48ygqRprKirqzZwR3SAURWIXYx6KkONajTW9FpAYagPk5AyJuKAUmQF0Jp CvdWFbRqXxmDgWPnX70PQ7zhGs8K/Z08LbbPBNubE7hOXJ+KTZ+gyr/Kn5yRPEZP7arm kyoHmTK29J/LPw8ZetDWxT3Ez2X+6bq6RlOfv4LTDwPvddZU5X+BH9l0m5f+l0Ae3YMg 68kDJmv6AsCOiRjMw0HGzQ0437B53Xbu7HjwpfR6PPkka00w7CDbD3xFSR6JB2WLBW6q p1VzjA7HEUC5mnLXPOdTxl/KlygG3Zigc1Pgh9pkWLbxAHCxcp4FoOzYSnZ/830lYCps MrTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752668040; x=1753272840; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HaE/uvmpjXwkT+32+z27ndNj6LA8nziCy1bugpOmhKs=; b=jcNC9ViTtAx8vZDlVD2fs9QhuApgVg6ULbPvbiOMjkwpXxEd8ve8KZY0nfWDP/C+j5 K9phwCjvUGeNA52DdcvzzEEF9MVNDaUA5QbAEtP1psHLIHzirz0qfk+zZher4kQv/QWN gdJr9nI9OwRMaCGPPV/iJx5bWBysEGqLTob1QV7sow6VZ0muaQIZa6z9rwR2gX4Je189 v8kB14I/6aqz+WmXjaC4Jexeph9aGOWWQ+PpsdZqngvAGDZPe8fN06IntyOXFIqG5NDU 0QjnkGNhrkcko6QgvF06pp2i3Tw/YSRrPjCMLQ06DYkHpXFhEwo3tygZw4zwPz1dIdtm RH+Q== X-Gm-Message-State: AOJu0Yy1aD4S9l8fyt52vIiZ6RzfbbztwYNOdDLjt6hYzydxfEiCDsMD 2SSH77QbTd0uJFX7fo/6lQofCSgGGgpdUnQMz6XZXCadbMW6omVNqI2eR4L793w/qXe7iyQ3R36 CV1hnQzz5TT26hkTL2xeqiJOkOiD2aELoFmokqkY= X-Gm-Gg: ASbGnctSDfFaMVnIdftExNkV8KpwAks7vxy/D+moq+TqalqX0tPGNxGFlMft8zrcD2Q VbI8j3CZGY1qpdVeNwIpIcdJQVDTXzZuPoaXzO+KGKjtiV7TrN4zA1vi+dfdm/yJBtscL6D33PL P9wxs9ZYEODkgfTnAmT8hEiLRURbJI3Q+jNFgVQsaC7g85CmieOih9e8hQJ5oKRv2k2vOQsVl1H UW0 X-Google-Smtp-Source: AGHT+IF4MRPSrfvfWSD6NUxSLiZ7kCVHSYgt5g3yicmEE6gjbaqG3XAxO8qMndw5xDEcg2JjWQGToZlT4NRUlfPy44I= X-Received: by 2002:a05:6000:144d:b0:3a5:2cb5:63fa with SMTP id ffacd0b85a97d-3b60dd4b5a5mr2680797f8f.2.1752668040222; Wed, 16 Jul 2025 05:14:00 -0700 (PDT) MIME-Version: 1.0 From: TV Date: Wed, 16 Jul 2025 14:13:49 +0200 X-Gm-Features: Ac12FXwdplIHGdQppzgyByh7zCXc9kUA8oPA_kpwLjtpY3jRdqQSWY4E3KfleEg Message-ID: Subject: Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions? To: pgpool-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000f2618a063a0ad333" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f2618a063a0ad333 Content-Type: text/plain; charset="UTF-8" Just to give a bit of background, we've recently migrated from old setup to new physical servers, and are running Ubuntu24 and latest (4.6.2) version of pgpool. The migration went fairly well, but we are noticing that the performance isn't any better than on the old servers, frankly it seems... worse. I was wondering if some of the pgpool pros could look over our config and perhaps recommend some changes/tuning? Hardware-wise, it's pretty beefy, we got 1TB of RAM to play with, 80 cores (2 processors with 20 physical cores and 40 virtual), hardware definitely doesn't seem to be a problem. Some 'highlights' from pgpool.conf, feel free to ask for other settings if they'll help to clear up the picture: num_init_children = 3500 max_pool = 1 child_life_time = 0 child_max_connections = 0 connection_life_time = 500 client_idle_limit = 600 process_management_mode = dynamic process_management_strategy = gentle min_spare_children = 50 max_spare_children = 100 connection_cache = on load_balance_mode = on disable_load_balance_on_write = 'transaction' statement_level_load_balance = on This is a 4 node cluster running PG13 and backend_weight is set to 1 for all 4 nodes. Some of the errors we are seeing in pgpool logs: 2025-07-15 10:57:32: pid 2629089: CONTEXT: while checking replication time lag 2025-07-15 10:57:32: pid 2629089: LOCATION: pool_worker_child.c:644 2025-07-15 10:57:33: pid 3892376: LOG: Error message from backend: DB node id: 2 message: "canceling statement due to conflict with recovery" 2025-07-15 10:57:33: pid 3892376: LOCATION: pool_proto_modules.c:3226 2025-07-15 10:57:33: pid 3892376: FATAL: unable to read data from DB node 2 2025-07-15 10:57:33: pid 3892376: DETAIL: EOF encountered with backend 2025-07-15 10:57:33: pid 3892376: LOCATION: pool_stream.c:274 2025-07-15 10:57:33: pid 2629004: LOG: child process with pid: 3892376 exited with success and will not be restarted 2025-07-15 10:57:33: pid 2629004: LOCATION: pgpool_main.c:2059 Also this: 2025-07-15 11:02:22: pid 3892505: ERROR: unable to read data from DB node 2 2025-07-15 11:02:22: pid 3892505: DETAIL: do not failover because failover_on_backend_error is off 2025-07-15 11:02:22: pid 3892505: LOCATION: pool_stream.c:407 2025-07-15 11:02:22: pid 3892505: WARNING: write on backend 2 failed with error :"Broken pipe" 2025-07-15 11:02:22: pid 3892505: DETAIL: while trying to write data from offset: 0 wlen: 17 2025-07-15 11:02:22: pid 3892505: LOCATION: pool_stream.c:714 2025-07-15 11:02:22: pid 3892505: WARNING: write on backend 2 failed with error :"Broken pipe" 2025-07-15 11:02:22: pid 3892505: DETAIL: while trying to write data from offset: 0 wlen: 5 2025-07-15 11:02:22: pid 3892505: LOCATION: pool_stream.c:714 saw this is as well: 2025-07-15 11:05:12: pid 2629089: CONTEXT: while checking replication time lag 2025-07-15 11:05:12: pid 2629089: LOCATION: pool_worker_child.c:644 2025-07-15 11:05:19: pid 3891928: ERROR: unable to read data from frontend 2025-07-15 11:05:19: pid 3891928: DETAIL: socket read function returned -1 2025-07-15 11:05:19: pid 3891928: LOCATION: pool_stream.c:414 2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice message from backend: DB node id: 1 backend pid: 3938180 statement: "ABORT" message: "terminating connection due to conflict with recovery" 2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955 2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice message from backend: DB node id: 2 backend pid: 3929256 statement: "ABORT" message: "terminating connection due to conflict with recovery" 2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955 2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice message from backend: DB node id: 3 backend pid: 3929098 statement: "ABORT" message: "terminating connection due to conflict with recovery" 2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955 2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice message from backend: DB node id: 0 backend pid: 3060000 statement: "ABORT" message: "terminating connection due to idle-in-transaction timeout" 2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955 2025-07-15 11:05:19: pid 3891928: WARNING: write on backend 1 failed with error :"Broken pipe" 2025-07-15 11:05:19: pid 3891928: DETAIL: while trying to write data from offset: 0 wlen: 5 Some of these generally seem to suggest connectivity problems? Anything you can suggest to look into? It's also worth noting that if we bypass the pgpool VIP and connect the applications directly to the DB master node, there are no problems reported so it sure does seem like something with our pgpool setup... Any help will be much recommended. --000000000000f2618a063a0ad333 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Just to give a bit of background, we've recently = migrated from old=20 setup to new physical servers, and are running Ubuntu24 and latest=20 (4.6.2) version of pgpool.=C2=A0 The migration went fairly well, but we are= =20 noticing that the performance isn't any better than on the old servers,= =20 frankly it seems... worse.=C2=A0 I was wondering if some of the pgpool pros= =20 could look over our config and perhaps recommend some changes/tuning?=C2=A0= =20 Hardware-wise, it's pretty beefy, we got 1TB of RAM to play with, 80=20 cores (2 processors with 20 physical cores and 40 virtual), hardware=20 definitely doesn't seem to be a problem.=C2=A0 =C2=A0Some 'highligh= ts' from=20 pgpool.conf, feel free to ask for other settings if they'll help to=20 clear up the picture:

num_init_children =3D 3500
max_pool =3D 1
child_life_time =3D 0
child_max= _connections =3D 0
connection_life_time =3D 500
client_= idle_limit =3D 600
process_management_mode =3D dynamic
= process_management_strategy =3D gentle
min_spare_children =3D 50<= br>max_spare_children =3D 100
connection_cache =3D on
l= oad_balance_mode =3D on
disable_load_balance_on_write =3D 'transacti= on'
statement_level_load_balance =3D on

<= div>This is a 4 node cluster running PG13 and=C2=A0backend_weight is set to= 1 for all 4 nodes.

Some of the errors we are seei= ng in pgpool logs:
2025-07-15 10:57:32: pid 2629089: CONTEXT: =C2= =A0while checking replication time lag
2025-07-15 10:57:32: pid 2629089:= LOCATION: =C2=A0pool_worker_child.c:644
2025-07-15 10:57:33: pid 3892376: LOG: =C2=A0Error message from backend: DB node id: = 2=20 message: "canceling statement due to conflict with recovery"
2= 025-07-15 10:57:33: pid 3892376: LOCATION: =C2=A0pool_proto_modules.c:3226<= br>2025-07-15 10:57:33: pid 3892376: FATAL: =C2=A0unable to read data from = DB node 2
2025-07-15 10:57:33: pid 3892376: DETAIL: =C2=A0EOF encountere= d with backend
2025-07-15 10:57:33: pid 3892376: LOCATION: =C2=A0pool_st= ream.c:274
2025-07-15 10:57:33: pid 2629004: LOG: =C2=A0child process wi= th pid: 3892376 exited with success and will not be restarted
2025-07-15= 10:57:33: pid 2629004: LOCATION: =C2=A0pgpool_main.c:2059
Also this:
2025-07-15 11:02:22: pid 3892505: ERROR: = =C2=A0unable to read data from DB node 2
2025-07-15 11:02:22: pid 389250= 5: DETAIL: =C2=A0do not failover because failover_on_backend_error is off2025-07-15 11:02:22: pid 3892505: LOCATION: =C2=A0pool_stream.c:407
20= 25-07-15 11:02:22: pid 3892505: WARNING: =C2=A0write on backend 2 failed wi= th error :"Broken pipe"
2025-07-15 11:02:22: pid 3892505: DETA= IL: =C2=A0while trying to write data from offset: 0 wlen: 17
2025-07-15 = 11:02:22: pid 3892505: LOCATION: =C2=A0pool_stream.c:714
2025-07-15 11:0= 2:22: pid 3892505: WARNING: =C2=A0write on backend 2 failed with error :&qu= ot;Broken pipe"
2025-07-15 11:02:22: pid 3892505: DETAIL: =C2=A0whi= le trying to write data from offset: 0 wlen: 5
2025-07-15 11:02:22: pid = 3892505: LOCATION: =C2=A0pool_stream.c:714


saw this is as well:
2025-07-15 11:05:12: pid 2629089: CO= NTEXT: =C2=A0while checking replication time lag
2025-07-15 11:05:12: pi= d 2629089: LOCATION: =C2=A0pool_worker_child.c:644
2025-07-15 11:05:19: = pid 3891928: ERROR: =C2=A0unable to read data from frontend
2025-07-15 1= 1:05:19: pid 3891928: DETAIL: =C2=A0socket read function returned -1
202= 5-07-15 11:05:19: pid 3891928: LOCATION: =C2=A0pool_stream.c:414
2025-07= -15 11:05:19: pid 3891928: LOG: =C2=A0pool_send_and_wait: Error or notice=20 message from backend: DB node id: 1 backend pid: 3938180 statement:=20 "ABORT" message:
"terminating connection due to conflict= with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION: =C2=A0p= ool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG: =C2=A0pool_send_and_wait: Error or notice=20 message from backend: DB node id: 2 backend pid: 3929256 statement:=20 "ABORT" message:
"terminating connection due to conflict= with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION: =C2=A0p= ool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG: =C2=A0pool_send_and_wait: Error or notice=20 message from backend: DB node id: 3 backend pid: 3929098 statement:=20 "ABORT" message:
"terminating connection due to conflict= with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION: =C2=A0p= ool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG: =C2=A0pool_send_and_wait: Error or notice=20 message from backend: DB node id: 0 backend pid: 3060000 statement:=20 "ABORT" message:
"terminating connection due to idle-in-= transaction timeout"
2025-07-15 11:05:19: pid 3891928: LOCATION: = =C2=A0pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: WARNIN= G: =C2=A0write on backend 1 failed with error :"Broken pipe"
2= 025-07-15 11:05:19: pid 3891928: DETAIL: =C2=A0while trying to write data f= rom offset: 0 wlen: 5

Some of these generally seem to suggest connectivity problems?=C2=A0 Anything y= ou can suggest to look into?=C2=A0 =C2=A0It's also worth noting that if w= e bypass=20 the pgpool VIP and connect the applications directly to the DB master=20 node, there are no problems reported so it sure does seem like something with our pgpool setup...

Any help will be much re= commended.
<= br>=C2=A0

--000000000000f2618a063a0ad333--