public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adam Blomeke <[email protected]>
To: [email protected]
Subject: Pgpool can't detect database status properly
Date: Fri, 12 Dec 2025 16:25:59 -0500
Message-ID: <CAG9Amsj61Oy7YGJw40uv7fedZwabLu2b5dVmJK4aDrZLDMVj+w@mail.gmail.com> (raw)

I'm resending this as it's been sitting in the moderation queue for a
while. Possibly because I didn't have a subject line? Anyways, any help
would be great. Thanks!

I’m setting up a pgpool cluster to replace a single node database in my
environment. The single node is separate from the cluster at the moment.
When it’s time to implement the DB I’m going to redo the backup/restore,
throw an upgrade from pg15->18, and then bring the cluster and take over
the old IP.



*Environment:*

   - pgpool-II version: 4.6.3 (chirikoboshi)
   - PostgreSQL version: 18
   - OS: RHEL9
   - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, 10.6.1.198)
   + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)



*Issue:*

I have pgpool configured and I’ve set it up using the scripts and config
files from a different instance, one which has been running just fine for a
year and a half or so. The issue I’m experiencing is that when I
detach/reattach a node, it sits in waiting constantly. It never transitions
to up. I have to manually change the status file to up for it to get to
agree that it is, and when I try to drop the node it doesn't actually drop
it. It just goes into waiting again. I also don’t see any connection
attempts from the pgpool server to the postgres nodes if I look at postgres
logs. I've confirmed that it can run the postgres commands from the command
line. I've tried this both running pgpool as a service and running it
directly from the command line. No difference in behavior.



Here’s the log output:

2025-12-03 14:20:49.037: main pid 1085028: LOG:  === Starting fail back.
reconnect host 10.6.1.200(5432) ===

2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:4169

2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
(status: 2)

2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:1524

2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart children
because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
are in streaming replication mode and not all backends were down

2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:4370

2025-12-03 14:20:49.038: main pid 1085028: LOG:
find_primary_node_repeatedly: waiting for finding a primary node

2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:2896

2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node: primary
node is 0

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2815

2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node: standby
node is 1

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2821

2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new primary
node: 0

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4660

2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new main
node: 0

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4667

2025-12-03 14:20:49.189: main pid 1085028: LOG:  === Failback done.
reconnect host 10.6.1.200(5432) ===

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4763

2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker process
received restart request

2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
pool_worker_child.c:182

2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
received in pcp child process

2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:  pcp_child.c:173

2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087 exits
with status 0 in failover()

2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4850

2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child pid
1085089 in failover()

2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4854

2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process: 1085089
started

2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:  pcp_child.c:165

2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process started

2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
pgpool_main.c:905

2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp worker,
pid=1085093 socket=7

2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:  pcp_child.c:327

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with pid:
1085093 exit with SUCCESS.

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:384

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with pid:
1085093 exits with status 0

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:398

2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
event detected

2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself

2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524

2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
event detected

2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself

2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524

2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
event detected

2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself

2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524

2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
event detected

2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself

2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524

2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
event detected

2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself

2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524

2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
event detected

......over and over and over again.




pcp_node_info output:

10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
2025-12-03 14:04:39

10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
2025-12-03 14:04:39

Logs show:

node status[0]: 1

node status[1]: 2

Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
(up).

*auto_failback behavior:*

   - When a node is detached (pcp_detach_node), it goes to status 3 (down)
   - auto_failback triggers and moves it to status 1 (waiting)
   - Node never transitions from waiting to up

*Key configuration:*

backend_clustering_mode = 'streaming_replication'

backend_hostname0 = '10.6.1.199'

backend_hostname1 = '10.6.1.200'

backend_application_name0 = 'nasdw_users_1'

backend_application_name1 = 'nasdw_users_2'



use_watchdog = on

# 3 watchdog nodes configured



auto_failback = on

auto_failback_interval = 1



sr_check_period = 10

sr_check_user = 'pgpool'

sr_check_database = 'nasdw_users'



health_check_period = 1

health_check_user = 'pgpool'

health_check_database = 'nasdw_users'



failover_when_quorum_exists = on (default)

failover_require_consensus = on (default)
Cheers,
Adam


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Pgpool can't detect database status properly
  In-Reply-To: <CAG9Amsj61Oy7YGJw40uv7fedZwabLu2b5dVmJK4aDrZLDMVj+w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox