public inbox for [email protected]
help / color / mirror / Atom feedFrom: Adam Blomeke <[email protected]>
To: [email protected]
Subject: Pgpool can't detect database status properly
Date: Fri, 12 Dec 2025 16:25:59 -0500
Message-ID: <CAG9Amsj61Oy7YGJw40uv7fedZwabLu2b5dVmJK4aDrZLDMVj+w@mail.gmail.com> (raw)
I'm resending this as it's been sitting in the moderation queue for a
while. Possibly because I didn't have a subject line? Anyways, any help
would be great. Thanks!
I’m setting up a pgpool cluster to replace a single node database in my
environment. The single node is separate from the cluster at the moment.
When it’s time to implement the DB I’m going to redo the backup/restore,
throw an upgrade from pg15->18, and then bring the cluster and take over
the old IP.
*Environment:*
- pgpool-II version: 4.6.3 (chirikoboshi)
- PostgreSQL version: 18
- OS: RHEL9
- Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, 10.6.1.198)
+ 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
*Issue:*
I have pgpool configured and I’ve set it up using the scripts and config
files from a different instance, one which has been running just fine for a
year and a half or so. The issue I’m experiencing is that when I
detach/reattach a node, it sits in waiting constantly. It never transitions
to up. I have to manually change the status file to up for it to get to
agree that it is, and when I try to drop the node it doesn't actually drop
it. It just goes into waiting again. I also don’t see any connection
attempts from the pgpool server to the postgres nodes if I look at postgres
logs. I've confirmed that it can run the postgres commands from the command
line. I've tried this both running pgpool as a service and running it
directly from the command line. No difference in behavior.
Here’s the log output:
2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail back.
reconnect host 10.6.1.200(5432) ===
2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:4169
2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down
(status: 2)
2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:1524
2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart children
because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
are in streaming replication mode and not all backends were down
2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:4370
2025-12-03 14:20:49.038: main pid 1085028: LOG:
find_primary_node_repeatedly: waiting for finding a primary node
2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:2896
2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: primary
node is 0
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2815
2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: standby
node is 1
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2821
2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new primary
node: 0
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4660
2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main
node: 0
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4667
2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done.
reconnect host 10.6.1.200(5432) ===
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4763
2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker process
received restart request
2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
pool_worker_child.c:182
2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request
received in pcp child process
2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:173
2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exits
with status 0 in failover()
2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4850
2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child pid
1085089 in failover()
2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4854
2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 1085089
started
2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:165
2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process started
2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
pgpool_main.c:905
2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp worker,
pid=1085093 socket=7
2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:327
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid:
1085093 exit with SUCCESS.
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:384
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid:
1085093 exits with status 0
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:398
2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback
event detected
2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself
2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524
2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback
event detected
2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself
2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524
2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback
event detected
2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself
2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524
2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback
event detected
2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself
2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524
2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback
event detected
2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself
2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524
2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback
event detected
......over and over and over again.
pcp_node_info output:
10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
2025-12-03 14:04:39
10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
2025-12-03 14:04:39
Logs show:
node status[0]: 1
node status[1]: 2
Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
(up).
*auto_failback behavior:*
- When a node is detached (pcp_detach_node), it goes to status 3 (down)
- auto_failback triggers and moves it to status 1 (waiting)
- Node never transitions from waiting to up
*Key configuration:*
backend_clustering_mode = 'streaming_replication'
backend_hostname0 = '10.6.1.199'
backend_hostname1 = '10.6.1.200'
backend_application_name0 = 'nasdw_users_1'
backend_application_name1 = 'nasdw_users_2'
use_watchdog = on
# 3 watchdog nodes configured
auto_failback = on
auto_failback_interval = 1
sr_check_period = 10
sr_check_user = 'pgpool'
sr_check_database = 'nasdw_users'
health_check_period = 1
health_check_user = 'pgpool'
health_check_database = 'nasdw_users'
failover_when_quorum_exists = on (default)
failover_require_consensus = on (default)
Cheers,
Adam
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Pgpool can't detect database status properly
In-Reply-To: <CAG9Amsj61Oy7YGJw40uv7fedZwabLu2b5dVmJK4aDrZLDMVj+w@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox