public inbox for [email protected]help / color / mirror / Atom feed
Pgpool can't detect database status properly 6+ messages / 2 participants [nested] [flat]
* Pgpool can't detect database status properly @ 2025-12-12 21:25 Adam Blomeke <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Adam Blomeke @ 2025-12-12 21:25 UTC (permalink / raw) To: [email protected] I'm resending this as it's been sitting in the moderation queue for a while. Possibly because I didn't have a subject line? Anyways, any help would be great. Thanks! I’m setting up a pgpool cluster to replace a single node database in my environment. The single node is separate from the cluster at the moment. When it’s time to implement the DB I’m going to redo the backup/restore, throw an upgrade from pg15->18, and then bring the cluster and take over the old IP. *Environment:* - pgpool-II version: 4.6.3 (chirikoboshi) - PostgreSQL version: 18 - OS: RHEL9 - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, 10.6.1.198) + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) *Issue:* I have pgpool configured and I’ve set it up using the scripts and config files from a different instance, one which has been running just fine for a year and a half or so. The issue I’m experiencing is that when I detach/reattach a node, it sits in waiting constantly. It never transitions to up. I have to manually change the status file to up for it to get to agree that it is, and when I try to drop the node it doesn't actually drop it. It just goes into waiting again. I also don’t see any connection attempts from the pgpool server to the postgres nodes if I look at postgres logs. I've confirmed that it can run the postgres commands from the command line. I've tried this both running pgpool as a service and running it directly from the command line. No difference in behavior. Here’s the log output: 2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail back. reconnect host 10.6.1.200(5432) === 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:4169 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down (status: 2) 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:1524 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart children because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we are in streaming replication mode and not all backends were down 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:4370 2025-12-03 14:20:49.038: main pid 1085028: LOG: find_primary_node_repeatedly: waiting for finding a primary node 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:2896 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: primary node is 0 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2815 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: standby node is 1 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2821 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new primary node: 0 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4660 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main node: 0 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4667 2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done. reconnect host 10.6.1.200(5432) === 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4763 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker process received restart request 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: pool_worker_child.c:182 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request received in pcp child process 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:173 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exits with status 0 in failover() 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4850 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child pid 1085089 in failover() 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4854 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 1085089 started 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:165 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process started 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: pgpool_main.c:905 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp worker, pid=1085093 socket=7 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:327 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid: 1085093 exit with SUCCESS. 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:384 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid: 1085093 exits with status 0 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:398 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback event detected 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback event detected 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback event detected 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback event detected 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback event detected 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback event detected ......over and over and over again. pcp_node_info output: 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none 2025-12-03 14:04:39 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async 2025-12-03 14:04:39 Logs show: node status[0]: 1 node status[1]: 2 Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2 (up). *auto_failback behavior:* - When a node is detached (pcp_detach_node), it goes to status 3 (down) - auto_failback triggers and moves it to status 1 (waiting) - Node never transitions from waiting to up *Key configuration:* backend_clustering_mode = 'streaming_replication' backend_hostname0 = '10.6.1.199' backend_hostname1 = '10.6.1.200' backend_application_name0 = 'nasdw_users_1' backend_application_name1 = 'nasdw_users_2' use_watchdog = on # 3 watchdog nodes configured auto_failback = on auto_failback_interval = 1 sr_check_period = 10 sr_check_user = 'pgpool' sr_check_database = 'nasdw_users' health_check_period = 1 health_check_user = 'pgpool' health_check_database = 'nasdw_users' failover_when_quorum_exists = on (default) failover_require_consensus = on (default) Cheers, Adam ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Pgpool can't detect database status properly @ 2025-12-15 23:43 Tatsuo Ishii <[email protected]> parent: Adam Blomeke <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Tatsuo Ishii @ 2025-12-15 23:43 UTC (permalink / raw) To: [email protected]; +Cc: [email protected] > I'm resending this as it's been sitting in the moderation queue for a > while. Possibly because I didn't have a subject line? Anyways, any help > would be great. Thanks! I received your email this time. > I’m setting up a pgpool cluster to replace a single node database in my > environment. The single node is separate from the cluster at the moment. > When it’s time to implement the DB I’m going to redo the backup/restore, > throw an upgrade from pg15->18, and then bring the cluster and take over > the old IP. > > > > *Environment:* > > - pgpool-II version: 4.6.3 (chirikoboshi) > - PostgreSQL version: 18 > - OS: RHEL9 > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, 10.6.1.198) > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) > > > > *Issue:* > > I have pgpool configured and I’ve set it up using the scripts and config > files from a different instance, one which has been running just fine for a > year and a half or so. The issue I’m experiencing is that when I > detach/reattach a node, it sits in waiting constantly. It never transitions > to up. If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql and issue an SQL command, for example "SELECT 1", does it work? If it works, it means pgpool works fine. > I have to manually change the status file to up for it to get to > agree that it is, The pgpool status "waiting" means that the backend node has never revceived any query from pgpool clients yet. You can safely assume that that pgpool is up and running. Once pgpool receives queries, the status should be changed from "waiting" to "up". > and when I try to drop the node it doesn't actually drop > it. It just goes into waiting again. Sounds like an effect of auto fail back. because you set: auto_failback_interval = 1 pgpool almost immediately brings the pgpool to online. > I also don’t see any connection > attempts from the pgpool server to the postgres nodes if I look at postgres > logs. I've confirmed that it can run the postgres commands from the command > line. I've tried this both running pgpool as a service and running it > directly from the command line. No difference in behavior. Probably there's something wrong in the configuration or trying to connect to wrong IP and/or port. Please turn on log_client_messages and log_per_node_statement, then send an SQL command to pgpool, and examin the pgpool log. > Here’s the log output: > > 2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail back. > reconnect host 10.6.1.200(5432) === > > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:4169 > > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down > (status: 2) > > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:1524 > > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart children > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we > are in streaming replication mode and not all backends were down > > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:4370 > > 2025-12-03 14:20:49.038: main pid 1085028: LOG: > find_primary_node_repeatedly: waiting for finding a primary node > > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:2896 > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: primary > node is 0 > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2815 > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: standby > node is 1 > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2821 > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new primary > node: 0 > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4660 > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main > node: 0 > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4667 > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done. > reconnect host 10.6.1.200(5432) === > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4763 > > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker process > received restart request > > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: > pool_worker_child.c:182 > > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request > received in pcp child process > > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:173 > > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exits > with status 0 in failover() > > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4850 > > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child pid > 1085089 in failover() > > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4854 > > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 1085089 > started > > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:165 > > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process started > > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: > pgpool_main.c:905 > > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp worker, > pid=1085093 socket=7 > > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:327 > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid: > 1085093 exit with SUCCESS. > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:384 > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid: > 1085093 exits with status 0 > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:398 > > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback > event detected > > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself > > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 > > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback > event detected > > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself > > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 > > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback > event detected > > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself > > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 > > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback > event detected > > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself > > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 > > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback > event detected > > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself > > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 > > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback > event detected > > ......over and over and over again. > > > > > pcp_node_info output: > > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none > 2025-12-03 14:04:39 > > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async > 2025-12-03 14:04:39 > > Logs show: > > node status[0]: 1 > > node status[1]: 2 > > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2 > (up). No, this does not show the backend status. Instead, it says > node status[0]: 1 This means backend 0 is primary. > node status[1]: 2 This means backend 1 is standby. > *auto_failback behavior:* > > - When a node is detached (pcp_detach_node), it goes to status 3 (down) > - auto_failback triggers and moves it to status 1 (waiting) > - Node never transitions from waiting to up Sounds like pgpool has not received queries. > *Key configuration:* > > backend_clustering_mode = 'streaming_replication' > > backend_hostname0 = '10.6.1.199' > > backend_hostname1 = '10.6.1.200' > > backend_application_name0 = 'nasdw_users_1' > > backend_application_name1 = 'nasdw_users_2' > > > > use_watchdog = on > > # 3 watchdog nodes configured > > > > auto_failback = on > > auto_failback_interval = 1 > > > > sr_check_period = 10 > > sr_check_user = 'pgpool' > > sr_check_database = 'nasdw_users' > > > > health_check_period = 1 > > health_check_user = 'pgpool' > > health_check_database = 'nasdw_users' > > > > failover_when_quorum_exists = on (default) > > failover_require_consensus = on (default) > Cheers, > Adam ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Pgpool can't detect database status properly @ 2025-12-16 15:57 Adam Blomeke <[email protected]> parent: Tatsuo Ishii <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Adam Blomeke @ 2025-12-16 15:57 UTC (permalink / raw) To: Tatsuo Ishii <[email protected]>; +Cc: [email protected] Thanks for the reply. Your response makes sense as I'm still setting this cluster up, so it's just me trying to connect to it. I'm curious then what the right process is for when I need to pull a node out of the cluster for maintenance (e.g. patching). I was under the impression that I should drop the node, do a pg_rewind, manually set it as a standby if it was the primary, and then add the node back in pgpool. I guess I can't do that with auto failback turned on? Cheers, Adam On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]> wrote: > > I'm resending this as it's been sitting in the moderation queue for a > > while. Possibly because I didn't have a subject line? Anyways, any help > > would be great. Thanks! > > I received your email this time. > > > I’m setting up a pgpool cluster to replace a single node database in my > > environment. The single node is separate from the cluster at the moment. > > When it’s time to implement the DB I’m going to redo the backup/restore, > > throw an upgrade from pg15->18, and then bring the cluster and take over > > the old IP. > > > > > > > > *Environment:* > > > > - pgpool-II version: 4.6.3 (chirikoboshi) > > - PostgreSQL version: 18 > > - OS: RHEL9 > > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, > 10.6.1.198) > > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) > > > > > > > > *Issue:* > > > > I have pgpool configured and I’ve set it up using the scripts and config > > files from a different instance, one which has been running just fine > for a > > year and a half or so. The issue I’m experiencing is that when I > > detach/reattach a node, it sits in waiting constantly. It never > transitions > > to up. > > If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql > and issue an SQL command, for example "SELECT 1", does it work? If it > works, it means pgpool works fine. > > > I have to manually change the status file to up for it to get to > > agree that it is, > > The pgpool status "waiting" means that the backend node has never > revceived any query from pgpool clients yet. You can safely assume > that that pgpool is up and running. Once pgpool receives queries, the > status should be changed from "waiting" to "up". > > > and when I try to drop the node it doesn't actually drop > > it. It just goes into waiting again. > > Sounds like an effect of auto fail back. because you set: > > auto_failback_interval = 1 > > pgpool almost immediately brings the pgpool to online. > > > I also don’t see any connection > > attempts from the pgpool server to the postgres nodes if I look at > postgres > > logs. I've confirmed that it can run the postgres commands from the > command > > line. I've tried this both running pgpool as a service and running it > > directly from the command line. No difference in behavior. > > Probably there's something wrong in the configuration or trying to > connect to wrong IP and/or port. Please turn on log_client_messages > and log_per_node_statement, then send an SQL command to pgpool, and > examin the pgpool log. > > > Here’s the log output: > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail back. > > reconnect host 10.6.1.200(5432) === > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:4169 > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down > > (status: 2) > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:1524 > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart children > > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we > > are in streaming replication mode and not all backends were down > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:4370 > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOG: > > find_primary_node_repeatedly: waiting for finding a primary node > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:2896 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > primary > > node is 0 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2815 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > standby > > node is 1 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2821 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new > primary > > node: 0 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4660 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main > > node: 0 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4667 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done. > > reconnect host 10.6.1.200(5432) === > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4763 > > > > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker > process > > received restart request > > > > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: > > pool_worker_child.c:182 > > > > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request > > received in pcp child process > > > > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:173 > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exits > > with status 0 in failover() > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4850 > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child pid > > 1085089 in failover() > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4854 > > > > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 1085089 > > started > > > > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:165 > > > > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process > started > > > > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: > > pgpool_main.c:905 > > > > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp > worker, > > pid=1085093 socket=7 > > > > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:327 > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with > pid: > > 1085093 exit with SUCCESS. > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:384 > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with > pid: > > 1085093 exits with status 0 > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:398 > > > > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback > > event detected > > > > ......over and over and over again. > > > > > > > > > > pcp_node_info output: > > > > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none > > 2025-12-03 14:04:39 > > > > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async > > 2025-12-03 14:04:39 > > > > Logs show: > > > > node status[0]: 1 > > > > node status[1]: 2 > > > > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2 > > (up). > > No, this does not show the backend status. Instead, it says > > > node status[0]: 1 > > This means backend 0 is primary. > > > node status[1]: 2 > > This means backend 1 is standby. > > > *auto_failback behavior:* > > > > - When a node is detached (pcp_detach_node), it goes to status 3 > (down) > > - auto_failback triggers and moves it to status 1 (waiting) > > - Node never transitions from waiting to up > > Sounds like pgpool has not received queries. > > > *Key configuration:* > > > > backend_clustering_mode = 'streaming_replication' > > > > backend_hostname0 = '10.6.1.199' > > > > backend_hostname1 = '10.6.1.200' > > > > backend_application_name0 = 'nasdw_users_1' > > > > backend_application_name1 = 'nasdw_users_2' > > > > > > > > use_watchdog = on > > > > # 3 watchdog nodes configured > > > > > > > > auto_failback = on > > > > auto_failback_interval = 1 > > > > > > > > sr_check_period = 10 > > > > sr_check_user = 'pgpool' > > > > sr_check_database = 'nasdw_users' > > > > > > > > health_check_period = 1 > > > > health_check_user = 'pgpool' > > > > health_check_database = 'nasdw_users' > > > > > > > > failover_when_quorum_exists = on (default) > > > > failover_require_consensus = on (default) > > Cheers, > > Adam > ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Pgpool can't detect database status properly @ 2025-12-17 01:29 Tatsuo Ishii <[email protected]> parent: Adam Blomeke <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Tatsuo Ishii @ 2025-12-17 01:29 UTC (permalink / raw) To: [email protected]; +Cc: [email protected] > Thanks for the reply. > > Your response makes sense as I'm still setting this cluster up, so it's > just me trying to connect to it. > > I'm curious then what the right process is for when I need to pull a node > out of the cluster for maintenance (e.g. patching). I was under the > impression that I should drop the node, do a pg_rewind, manually set it as > a standby if it was the primary, and then add the node back in pgpool. I > guess I can't do that with auto failback turned on? Yes, while the maintenance, you should turn off auto failback. In fact, it's written in the document: https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTIME-CONFIG-FAILOVER-SETTINGS If you plan to detach standby node for maintenance, set this parameter to off beforehand. Otherwise it's possible that standby node is reattached against your intention. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp > Cheers, > Adam > > > On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]> wrote: > >> > I'm resending this as it's been sitting in the moderation queue for a >> > while. Possibly because I didn't have a subject line? Anyways, any help >> > would be great. Thanks! >> >> I received your email this time. >> >> > I’m setting up a pgpool cluster to replace a single node database in my >> > environment. The single node is separate from the cluster at the moment. >> > When it’s time to implement the DB I’m going to redo the backup/restore, >> > throw an upgrade from pg15->18, and then bring the cluster and take over >> > the old IP. >> > >> > >> > >> > *Environment:* >> > >> > - pgpool-II version: 4.6.3 (chirikoboshi) >> > - PostgreSQL version: 18 >> > - OS: RHEL9 >> > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, >> 10.6.1.198) >> > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) >> > >> > >> > >> > *Issue:* >> > >> > I have pgpool configured and I’ve set it up using the scripts and config >> > files from a different instance, one which has been running just fine >> for a >> > year and a half or so. The issue I’m experiencing is that when I >> > detach/reattach a node, it sits in waiting constantly. It never >> transitions >> > to up. >> >> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql >> and issue an SQL command, for example "SELECT 1", does it work? If it >> works, it means pgpool works fine. >> >> > I have to manually change the status file to up for it to get to >> > agree that it is, >> >> The pgpool status "waiting" means that the backend node has never >> revceived any query from pgpool clients yet. You can safely assume >> that that pgpool is up and running. Once pgpool receives queries, the >> status should be changed from "waiting" to "up". >> >> > and when I try to drop the node it doesn't actually drop >> > it. It just goes into waiting again. >> >> Sounds like an effect of auto fail back. because you set: >> >> auto_failback_interval = 1 >> >> pgpool almost immediately brings the pgpool to online. >> >> > I also don’t see any connection >> > attempts from the pgpool server to the postgres nodes if I look at >> postgres >> > logs. I've confirmed that it can run the postgres commands from the >> command >> > line. I've tried this both running pgpool as a service and running it >> > directly from the command line. No difference in behavior. >> >> Probably there's something wrong in the configuration or trying to >> connect to wrong IP and/or port. Please turn on log_client_messages >> and log_per_node_statement, then send an SQL command to pgpool, and >> examin the pgpool log. >> >> > Here’s the log output: >> > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail back. >> > reconnect host 10.6.1.200(5432) === >> > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:4169 >> > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down >> > (status: 2) >> > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:1524 >> > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart children >> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we >> > are in streaming replication mode and not all backends were down >> > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:4370 >> > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: >> > find_primary_node_repeatedly: waiting for finding a primary node >> > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:2896 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: >> primary >> > node is 0 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2815 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: >> standby >> > node is 1 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2821 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new >> primary >> > node: 0 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4660 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main >> > node: 0 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4667 >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done. >> > reconnect host 10.6.1.200(5432) === >> > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4763 >> > >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker >> process >> > received restart request >> > >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: >> > pool_worker_child.c:182 >> > >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request >> > received in pcp child process >> > >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:173 >> > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exits >> > with status 0 in failover() >> > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4850 >> > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child pid >> > 1085089 in failover() >> > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4854 >> > >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 1085089 >> > started >> > >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:165 >> > >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process >> started >> > >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: >> > pgpool_main.c:905 >> > >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp >> worker, >> > pid=1085093 socket=7 >> > >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:327 >> > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with >> pid: >> > 1085093 exit with SUCCESS. >> > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:384 >> > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with >> pid: >> > 1085093 exits with status 0 >> > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:398 >> > >> > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback >> > event detected >> > >> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself >> > >> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 >> > >> > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback >> > event detected >> > >> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself >> > >> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 >> > >> > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback >> > event detected >> > >> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself >> > >> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 >> > >> > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback >> > event detected >> > >> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself >> > >> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 >> > >> > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback >> > event detected >> > >> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself >> > >> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 >> > >> > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback >> > event detected >> > >> > ......over and over and over again. >> > >> > >> > >> > >> > pcp_node_info output: >> > >> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none >> > 2025-12-03 14:04:39 >> > >> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async >> > 2025-12-03 14:04:39 >> > >> > Logs show: >> > >> > node status[0]: 1 >> > >> > node status[1]: 2 >> > >> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2 >> > (up). >> >> No, this does not show the backend status. Instead, it says >> >> > node status[0]: 1 >> >> This means backend 0 is primary. >> >> > node status[1]: 2 >> >> This means backend 1 is standby. >> >> > *auto_failback behavior:* >> > >> > - When a node is detached (pcp_detach_node), it goes to status 3 >> (down) >> > - auto_failback triggers and moves it to status 1 (waiting) >> > - Node never transitions from waiting to up >> >> Sounds like pgpool has not received queries. >> >> > *Key configuration:* >> > >> > backend_clustering_mode = 'streaming_replication' >> > >> > backend_hostname0 = '10.6.1.199' >> > >> > backend_hostname1 = '10.6.1.200' >> > >> > backend_application_name0 = 'nasdw_users_1' >> > >> > backend_application_name1 = 'nasdw_users_2' >> > >> > >> > >> > use_watchdog = on >> > >> > # 3 watchdog nodes configured >> > >> > >> > >> > auto_failback = on >> > >> > auto_failback_interval = 1 >> > >> > >> > >> > sr_check_period = 10 >> > >> > sr_check_user = 'pgpool' >> > >> > sr_check_database = 'nasdw_users' >> > >> > >> > >> > health_check_period = 1 >> > >> > health_check_user = 'pgpool' >> > >> > health_check_database = 'nasdw_users' >> > >> > >> > >> > failover_when_quorum_exists = on (default) >> > >> > failover_require_consensus = on (default) >> > Cheers, >> > Adam >> ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Pgpool can't detect database status properly @ 2025-12-30 21:59 Adam Blomeke <[email protected]> parent: Tatsuo Ishii <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Adam Blomeke @ 2025-12-30 21:59 UTC (permalink / raw) To: Tatsuo Ishii <[email protected]>; +Cc: [email protected] Follow up on this. I've got autofailback set to off, but it's executing the follow primary script on the node as soon as I attempt to detach it. Is this expected behavior? [postgres@awaprodxtrldbpgpool1 ~]$ grep auto_failback /etc/pgpool-II/pgpool.conf auto_failback = off auto_failback_interval = 1 # Min interval of executing auto_failback in [postgres@awaprodxtrldbpgpool1 ~]$ # Checkpoint on primary psql -h 10.6.1.199 -d postgres -c "CHECKPOINT;" # Detach primary to trigger failover pcp_detach_node -h 10.6.1.54 -U pgpool -n 0 # Verify node 1 is now primary pcp_node_info -a -U pgpool -h 10.6.1.54 CHECKPOINT pcp_detach_node -- Command Successful 10.6.1.199 5432 3 0.500000 down up primary primary 0 none none 2025-12-30 14:22:10 10.6.1.200 5432 2 0.500000 up up standby standby 0 none none 2025-12-30 12:04:09 [postgres@awaprodxtrldbpgpool1 ~]$ pcp_node_info -a -U pgpool -h 10.6.1.54 10.6.1.199 5432 3 0.500000 down down standby unknown 0 none none 2025-12-30 14:22:11 10.6.1.200 5432 2 0.500000 up up primary primary 0 none none 2025-12-30 14:22:11 [postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected] "/usr/pgsql-18/bin/pg_ctl -D /opt/data/data18 stop" Authorized uses only. All activity may be monitored and reported. pg_ctl: PID file "/opt/data/data18/postmaster.pid" does not exist Is server running? [postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected] Authorized uses only. All activity may be monitored and reported. Last login: Tue Dec 30 11:45:09 2025 from 10.6.1.196 -bash: typeset: TMOUT: readonly variable [postgres@awaproddbvmnasdwusers1 ~]$ cd /opt/data [postgres@awaproddbvmnasdwusers1 data]$ ll total 1236 drwxr-x---. 2 postgres postgres 6 Dec 30 14:22 archive18 drwx------. 23 postgres postgres 4096 Dec 29 12:46 data15 drwx------. 13 postgres postgres 4096 Dec 30 14:22 data18 drwx------. 2 postgres postgres 74 Dec 1 15:50 dbservercert -rw-r-----. 1 postgres postgres 1255731 Oct 24 18:04 pg_basebackup.log [postgres@awaproddbvmnasdwusers1 data]$ cd data18/ [postgres@awaproddbvmnasdwusers1 data18]$ ll total 8 -rw-------. 1 postgres postgres 233 Dec 30 14:22 backup_label drwx------. 6 postgres postgres 66 Dec 30 14:22 base drwx------. 2 postgres postgres 4096 Dec 30 14:22 global drwx------. 2 postgres postgres 26 Dec 30 14:22 pg_commit_ts drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_dynshmem drwx------. 4 postgres postgres 48 Dec 30 14:22 pg_multixact drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_notify drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_serial drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_snapshots drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_subtrans drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_twophase drwx------. 4 postgres postgres 121 Dec 30 14:22 pg_wal [postgres@awaproddbvmnasdwusers1 data18]$ Cheers, Adam On Tue, Dec 16, 2025 at 5:29 PM Tatsuo Ishii <[email protected]> wrote: > > Thanks for the reply. > > > > Your response makes sense as I'm still setting this cluster up, so it's > > just me trying to connect to it. > > > > I'm curious then what the right process is for when I need to pull a node > > out of the cluster for maintenance (e.g. patching). I was under the > > impression that I should drop the node, do a pg_rewind, manually set it > as > > a standby if it was the primary, and then add the node back in pgpool. I > > guess I can't do that with auto failback turned on? > > Yes, while the maintenance, you should turn off auto failback. > In fact, it's written in the document: > > https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTIME-CONFIG-FAILOVER-SETTINGS > > If you plan to detach standby node for maintenance, set this > parameter to off beforehand. Otherwise it's possible that standby > node is reattached against your intention. > > Best regards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp > > > Cheers, > > Adam > > > > > > On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]> > wrote: > > > >> > I'm resending this as it's been sitting in the moderation queue for a > >> > while. Possibly because I didn't have a subject line? Anyways, any > help > >> > would be great. Thanks! > >> > >> I received your email this time. > >> > >> > I’m setting up a pgpool cluster to replace a single node database in > my > >> > environment. The single node is separate from the cluster at the > moment. > >> > When it’s time to implement the DB I’m going to redo the > backup/restore, > >> > throw an upgrade from pg15->18, and then bring the cluster and take > over > >> > the old IP. > >> > > >> > > >> > > >> > *Environment:* > >> > > >> > - pgpool-II version: 4.6.3 (chirikoboshi) > >> > - PostgreSQL version: 18 > >> > - OS: RHEL9 > >> > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, > >> 10.6.1.198) > >> > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) > >> > > >> > > >> > > >> > *Issue:* > >> > > >> > I have pgpool configured and I’ve set it up using the scripts and > config > >> > files from a different instance, one which has been running just fine > >> for a > >> > year and a half or so. The issue I’m experiencing is that when I > >> > detach/reattach a node, it sits in waiting constantly. It never > >> transitions > >> > to up. > >> > >> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql > >> and issue an SQL command, for example "SELECT 1", does it work? If it > >> works, it means pgpool works fine. > >> > >> > I have to manually change the status file to up for it to get to > >> > agree that it is, > >> > >> The pgpool status "waiting" means that the backend node has never > >> revceived any query from pgpool clients yet. You can safely assume > >> that that pgpool is up and running. Once pgpool receives queries, the > >> status should be changed from "waiting" to "up". > >> > >> > and when I try to drop the node it doesn't actually drop > >> > it. It just goes into waiting again. > >> > >> Sounds like an effect of auto fail back. because you set: > >> > >> auto_failback_interval = 1 > >> > >> pgpool almost immediately brings the pgpool to online. > >> > >> > I also don’t see any connection > >> > attempts from the pgpool server to the postgres nodes if I look at > >> postgres > >> > logs. I've confirmed that it can run the postgres commands from the > >> command > >> > line. I've tried this both running pgpool as a service and running it > >> > directly from the command line. No difference in behavior. > >> > >> Probably there's something wrong in the configuration or trying to > >> connect to wrong IP and/or port. Please turn on log_client_messages > >> and log_per_node_statement, then send an SQL command to pgpool, and > >> examin the pgpool log. > >> > >> > Here’s the log output: > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail > back. > >> > reconnect host 10.6.1.200(5432) === > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: > pgpool_main.c:4169 > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down > >> > (status: 2) > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: > pgpool_main.c:1524 > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart > children > >> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and > we > >> > are in streaming replication mode and not all backends were down > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: > pgpool_main.c:4370 > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: > >> > find_primary_node_repeatedly: waiting for finding a primary node > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: > pgpool_main.c:2896 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > >> primary > >> > node is 0 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:2815 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > >> standby > >> > node is 1 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:2821 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new > >> primary > >> > node: 0 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:4660 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new > main > >> > node: 0 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:4667 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done. > >> > reconnect host 10.6.1.200(5432) === > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:4763 > >> > > >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker > >> process > >> > received restart request > >> > > >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: > >> > pool_worker_child.c:182 > >> > > >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request > >> > received in pcp child process > >> > > >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: > pcp_child.c:173 > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 > exits > >> > with status 0 in failover() > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: > pgpool_main.c:4850 > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child > pid > >> > 1085089 in failover() > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: > pgpool_main.c:4854 > >> > > >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: > 1085089 > >> > started > >> > > >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: > pcp_child.c:165 > >> > > >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process > >> started > >> > > >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: > >> > pgpool_main.c:905 > >> > > >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp > >> worker, > >> > pid=1085093 socket=7 > >> > > >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: > pcp_child.c:327 > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with > >> pid: > >> > 1085093 exit with SUCCESS. > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: > pcp_child.c:384 > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with > >> pid: > >> > 1085093 exits with status 0 > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: > pcp_child.c:398 > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback > >> > event detected > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback > >> > event detected > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback > >> > event detected > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback > >> > event detected > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback > >> > event detected > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback > >> > event detected > >> > > >> > ......over and over and over again. > >> > > >> > > >> > > >> > > >> > pcp_node_info output: > >> > > >> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none > >> > 2025-12-03 14:04:39 > >> > > >> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming > async > >> > 2025-12-03 14:04:39 > >> > > >> > Logs show: > >> > > >> > node status[0]: 1 > >> > > >> > node status[1]: 2 > >> > > >> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets > status 2 > >> > (up). > >> > >> No, this does not show the backend status. Instead, it says > >> > >> > node status[0]: 1 > >> > >> This means backend 0 is primary. > >> > >> > node status[1]: 2 > >> > >> This means backend 1 is standby. > >> > >> > *auto_failback behavior:* > >> > > >> > - When a node is detached (pcp_detach_node), it goes to status 3 > >> (down) > >> > - auto_failback triggers and moves it to status 1 (waiting) > >> > - Node never transitions from waiting to up > >> > >> Sounds like pgpool has not received queries. > >> > >> > *Key configuration:* > >> > > >> > backend_clustering_mode = 'streaming_replication' > >> > > >> > backend_hostname0 = '10.6.1.199' > >> > > >> > backend_hostname1 = '10.6.1.200' > >> > > >> > backend_application_name0 = 'nasdw_users_1' > >> > > >> > backend_application_name1 = 'nasdw_users_2' > >> > > >> > > >> > > >> > use_watchdog = on > >> > > >> > # 3 watchdog nodes configured > >> > > >> > > >> > > >> > auto_failback = on > >> > > >> > auto_failback_interval = 1 > >> > > >> > > >> > > >> > sr_check_period = 10 > >> > > >> > sr_check_user = 'pgpool' > >> > > >> > sr_check_database = 'nasdw_users' > >> > > >> > > >> > > >> > health_check_period = 1 > >> > > >> > health_check_user = 'pgpool' > >> > > >> > health_check_database = 'nasdw_users' > >> > > >> > > >> > > >> > failover_when_quorum_exists = on (default) > >> > > >> > failover_require_consensus = on (default) > >> > Cheers, > >> > Adam > >> > ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Pgpool can't detect database status properly @ 2025-12-31 00:34 Tatsuo Ishii <[email protected]> parent: Adam Blomeke <[email protected]> 0 siblings, 0 replies; 6+ messages in thread From: Tatsuo Ishii @ 2025-12-31 00:34 UTC (permalink / raw) To: [email protected]; +Cc: [email protected] If you want the follow primary script to not execute, you can temporarily turn off (set an empty string to "follow_primary_command") and reload pgpool.conf before running pcp_detatch_node. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp > Follow up on this. I've got autofailback set to off, but it's executing the > follow primary script on the node as soon as I attempt to detach it. Is > this expected behavior? > > [postgres@awaprodxtrldbpgpool1 ~]$ grep auto_failback > /etc/pgpool-II/pgpool.conf > auto_failback = off > auto_failback_interval = 1 > # Min interval of executing > auto_failback in > [postgres@awaprodxtrldbpgpool1 ~]$ # Checkpoint on primary > psql -h 10.6.1.199 -d postgres -c "CHECKPOINT;" > # Detach primary to trigger failover > pcp_detach_node -h 10.6.1.54 -U pgpool -n 0 > # Verify node 1 is now primary > pcp_node_info -a -U pgpool -h 10.6.1.54 > CHECKPOINT > pcp_detach_node -- Command Successful > 10.6.1.199 5432 3 0.500000 down up primary primary 0 none none 2025-12-30 > 14:22:10 > 10.6.1.200 5432 2 0.500000 up up standby standby 0 none none 2025-12-30 > 12:04:09 > [postgres@awaprodxtrldbpgpool1 ~]$ pcp_node_info -a -U pgpool -h 10.6.1.54 > 10.6.1.199 5432 3 0.500000 down down standby unknown 0 none none 2025-12-30 > 14:22:11 > 10.6.1.200 5432 2 0.500000 up up primary primary 0 none none 2025-12-30 > 14:22:11 > [postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected] > "/usr/pgsql-18/bin/pg_ctl -D /opt/data/data18 stop" > Authorized uses only. All activity may be monitored and reported. > pg_ctl: PID file "/opt/data/data18/postmaster.pid" does not exist > Is server running? > [postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected] > Authorized uses only. All activity may be monitored and reported. > Last login: Tue Dec 30 11:45:09 2025 from 10.6.1.196 > -bash: typeset: TMOUT: readonly variable > [postgres@awaproddbvmnasdwusers1 ~]$ cd /opt/data > [postgres@awaproddbvmnasdwusers1 data]$ ll > total 1236 > drwxr-x---. 2 postgres postgres 6 Dec 30 14:22 archive18 > drwx------. 23 postgres postgres 4096 Dec 29 12:46 data15 > drwx------. 13 postgres postgres 4096 Dec 30 14:22 data18 > drwx------. 2 postgres postgres 74 Dec 1 15:50 dbservercert > -rw-r-----. 1 postgres postgres 1255731 Oct 24 18:04 pg_basebackup.log > [postgres@awaproddbvmnasdwusers1 data]$ cd data18/ > [postgres@awaproddbvmnasdwusers1 data18]$ ll > total 8 > -rw-------. 1 postgres postgres 233 Dec 30 14:22 backup_label > drwx------. 6 postgres postgres 66 Dec 30 14:22 base > drwx------. 2 postgres postgres 4096 Dec 30 14:22 global > drwx------. 2 postgres postgres 26 Dec 30 14:22 pg_commit_ts > drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_dynshmem > drwx------. 4 postgres postgres 48 Dec 30 14:22 pg_multixact > drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_notify > drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_serial > drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_snapshots > drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_subtrans > drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_twophase > drwx------. 4 postgres postgres 121 Dec 30 14:22 pg_wal > [postgres@awaproddbvmnasdwusers1 data18]$ > > Cheers, > Adam > > > On Tue, Dec 16, 2025 at 5:29 PM Tatsuo Ishii <[email protected]> wrote: > >> > Thanks for the reply. >> > >> > Your response makes sense as I'm still setting this cluster up, so it's >> > just me trying to connect to it. >> > >> > I'm curious then what the right process is for when I need to pull a node >> > out of the cluster for maintenance (e.g. patching). I was under the >> > impression that I should drop the node, do a pg_rewind, manually set it >> as >> > a standby if it was the primary, and then add the node back in pgpool. I >> > guess I can't do that with auto failback turned on? >> >> Yes, while the maintenance, you should turn off auto failback. >> In fact, it's written in the document: >> >> https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTIME-CONFIG-FAILOVER-SETTINGS >> >> If you plan to detach standby node for maintenance, set this >> parameter to off beforehand. Otherwise it's possible that standby >> node is reattached against your intention. >> >> Best regards, >> -- >> Tatsuo Ishii >> SRA OSS K.K. >> English: http://www.sraoss.co.jp/index_en/ >> Japanese:http://www.sraoss.co.jp >> >> > Cheers, >> > Adam >> > >> > >> > On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]> >> wrote: >> > >> >> > I'm resending this as it's been sitting in the moderation queue for a >> >> > while. Possibly because I didn't have a subject line? Anyways, any >> help >> >> > would be great. Thanks! >> >> >> >> I received your email this time. >> >> >> >> > I’m setting up a pgpool cluster to replace a single node database in >> my >> >> > environment. The single node is separate from the cluster at the >> moment. >> >> > When it’s time to implement the DB I’m going to redo the >> backup/restore, >> >> > throw an upgrade from pg15->18, and then bring the cluster and take >> over >> >> > the old IP. >> >> > >> >> > >> >> > >> >> > *Environment:* >> >> > >> >> > - pgpool-II version: 4.6.3 (chirikoboshi) >> >> > - PostgreSQL version: 18 >> >> > - OS: RHEL9 >> >> > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, >> >> 10.6.1.198) >> >> > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) >> >> > >> >> > >> >> > >> >> > *Issue:* >> >> > >> >> > I have pgpool configured and I’ve set it up using the scripts and >> config >> >> > files from a different instance, one which has been running just fine >> >> for a >> >> > year and a half or so. The issue I’m experiencing is that when I >> >> > detach/reattach a node, it sits in waiting constantly. It never >> >> transitions >> >> > to up. >> >> >> >> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql >> >> and issue an SQL command, for example "SELECT 1", does it work? If it >> >> works, it means pgpool works fine. >> >> >> >> > I have to manually change the status file to up for it to get to >> >> > agree that it is, >> >> >> >> The pgpool status "waiting" means that the backend node has never >> >> revceived any query from pgpool clients yet. You can safely assume >> >> that that pgpool is up and running. Once pgpool receives queries, the >> >> status should be changed from "waiting" to "up". >> >> >> >> > and when I try to drop the node it doesn't actually drop >> >> > it. It just goes into waiting again. >> >> >> >> Sounds like an effect of auto fail back. because you set: >> >> >> >> auto_failback_interval = 1 >> >> >> >> pgpool almost immediately brings the pgpool to online. >> >> >> >> > I also don’t see any connection >> >> > attempts from the pgpool server to the postgres nodes if I look at >> >> postgres >> >> > logs. I've confirmed that it can run the postgres commands from the >> >> command >> >> > line. I've tried this both running pgpool as a service and running it >> >> > directly from the command line. No difference in behavior. >> >> >> >> Probably there's something wrong in the configuration or trying to >> >> connect to wrong IP and/or port. Please turn on log_client_messages >> >> and log_per_node_statement, then send an SQL command to pgpool, and >> >> examin the pgpool log. >> >> >> >> > Here’s the log output: >> >> > >> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail >> back. >> >> > reconnect host 10.6.1.200(5432) === >> >> > >> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: >> pgpool_main.c:4169 >> >> > >> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down >> >> > (status: 2) >> >> > >> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: >> pgpool_main.c:1524 >> >> > >> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart >> children >> >> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and >> we >> >> > are in streaming replication mode and not all backends were down >> >> > >> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: >> pgpool_main.c:4370 >> >> > >> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: >> >> > find_primary_node_repeatedly: waiting for finding a primary node >> >> > >> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: >> pgpool_main.c:2896 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: >> >> primary >> >> > node is 0 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: >> pgpool_main.c:2815 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: >> >> standby >> >> > node is 1 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: >> pgpool_main.c:2821 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new >> >> primary >> >> > node: 0 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: >> pgpool_main.c:4660 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new >> main >> >> > node: 0 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: >> pgpool_main.c:4667 >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done. >> >> > reconnect host 10.6.1.200(5432) === >> >> > >> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: >> pgpool_main.c:4763 >> >> > >> >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker >> >> process >> >> > received restart request >> >> > >> >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: >> >> > pool_worker_child.c:182 >> >> > >> >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request >> >> > received in pcp child process >> >> > >> >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: >> pcp_child.c:173 >> >> > >> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 >> exits >> >> > with status 0 in failover() >> >> > >> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: >> pgpool_main.c:4850 >> >> > >> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child >> pid >> >> > 1085089 in failover() >> >> > >> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: >> pgpool_main.c:4854 >> >> > >> >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: >> 1085089 >> >> > started >> >> > >> >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: >> pcp_child.c:165 >> >> > >> >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process >> >> started >> >> > >> >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: >> >> > pgpool_main.c:905 >> >> > >> >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp >> >> worker, >> >> > pid=1085093 socket=7 >> >> > >> >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: >> pcp_child.c:327 >> >> > >> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with >> >> pid: >> >> > 1085093 exit with SUCCESS. >> >> > >> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: >> pcp_child.c:384 >> >> > >> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with >> >> pid: >> >> > 1085093 exits with status 0 >> >> > >> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: >> pcp_child.c:398 >> >> > >> >> > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback >> >> > event detected >> >> > >> >> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself >> >> > >> >> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 >> >> > >> >> > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback >> >> > event detected >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback >> >> > event detected >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback >> >> > event detected >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself >> >> > >> >> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 >> >> > >> >> > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback >> >> > event detected >> >> > >> >> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself >> >> > >> >> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 >> >> > >> >> > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback >> >> > event detected >> >> > >> >> > ......over and over and over again. >> >> > >> >> > >> >> > >> >> > >> >> > pcp_node_info output: >> >> > >> >> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none >> >> > 2025-12-03 14:04:39 >> >> > >> >> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming >> async >> >> > 2025-12-03 14:04:39 >> >> > >> >> > Logs show: >> >> > >> >> > node status[0]: 1 >> >> > >> >> > node status[1]: 2 >> >> > >> >> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets >> status 2 >> >> > (up). >> >> >> >> No, this does not show the backend status. Instead, it says >> >> >> >> > node status[0]: 1 >> >> >> >> This means backend 0 is primary. >> >> >> >> > node status[1]: 2 >> >> >> >> This means backend 1 is standby. >> >> >> >> > *auto_failback behavior:* >> >> > >> >> > - When a node is detached (pcp_detach_node), it goes to status 3 >> >> (down) >> >> > - auto_failback triggers and moves it to status 1 (waiting) >> >> > - Node never transitions from waiting to up >> >> >> >> Sounds like pgpool has not received queries. >> >> >> >> > *Key configuration:* >> >> > >> >> > backend_clustering_mode = 'streaming_replication' >> >> > >> >> > backend_hostname0 = '10.6.1.199' >> >> > >> >> > backend_hostname1 = '10.6.1.200' >> >> > >> >> > backend_application_name0 = 'nasdw_users_1' >> >> > >> >> > backend_application_name1 = 'nasdw_users_2' >> >> > >> >> > >> >> > >> >> > use_watchdog = on >> >> > >> >> > # 3 watchdog nodes configured >> >> > >> >> > >> >> > >> >> > auto_failback = on >> >> > >> >> > auto_failback_interval = 1 >> >> > >> >> > >> >> > >> >> > sr_check_period = 10 >> >> > >> >> > sr_check_user = 'pgpool' >> >> > >> >> > sr_check_database = 'nasdw_users' >> >> > >> >> > >> >> > >> >> > health_check_period = 1 >> >> > >> >> > health_check_user = 'pgpool' >> >> > >> >> > health_check_database = 'nasdw_users' >> >> > >> >> > >> >> > >> >> > failover_when_quorum_exists = on (default) >> >> > >> >> > failover_require_consensus = on (default) >> >> > Cheers, >> >> > Adam >> >> >> ^ permalink raw reply [nested|flat] 6+ messages in thread
end of thread, other threads:[~2025-12-31 00:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-12-12 21:25 Pgpool can't detect database status properly Adam Blomeke <[email protected]> 2025-12-15 23:43 ` Tatsuo Ishii <[email protected]> 2025-12-16 15:57 ` Adam Blomeke <[email protected]> 2025-12-17 01:29 ` Tatsuo Ishii <[email protected]> 2025-12-30 21:59 ` Adam Blomeke <[email protected]> 2025-12-31 00:34 ` Tatsuo Ishii <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox