Pgpool can't detect database status properly

public inbox for [email protected]  
help / color / mirror / Atom feed

Pgpool can't detect database status properly
6+ messages / 2 participants
[nested] [flat]

* Pgpool can't detect database status properly
@ 2025-12-12 21:25  Adam Blomeke <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Adam Blomeke @ 2025-12-12 21:25 UTC (permalink / raw)
  To: [email protected]

I'm resending this as it's been sitting in the moderation queue for a
while. Possibly because I didn't have a subject line? Anyways, any help
would be great. Thanks!

I’m setting up a pgpool cluster to replace a single node database in my
environment. The single node is separate from the cluster at the moment.
When it’s time to implement the DB I’m going to redo the backup/restore,
throw an upgrade from pg15->18, and then bring the cluster and take over
the old IP.



*Environment:*

   - pgpool-II version: 4.6.3 (chirikoboshi)
   - PostgreSQL version: 18
   - OS: RHEL9
   - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, 10.6.1.198)
   + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)



*Issue:*

I have pgpool configured and I’ve set it up using the scripts and config
files from a different instance, one which has been running just fine for a
year and a half or so. The issue I’m experiencing is that when I
detach/reattach a node, it sits in waiting constantly. It never transitions
to up. I have to manually change the status file to up for it to get to
agree that it is, and when I try to drop the node it doesn't actually drop
it. It just goes into waiting again. I also don’t see any connection
attempts from the pgpool server to the postgres nodes if I look at postgres
logs. I've confirmed that it can run the postgres commands from the command
line. I've tried this both running pgpool as a service and running it
directly from the command line. No difference in behavior.



Here’s the log output:

2025-12-03 14:20:49.037: main pid 1085028: LOG:  === Starting fail back.
reconnect host 10.6.1.200(5432) ===

2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:4169

2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
(status: 2)

2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:1524

2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart children
because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
are in streaming replication mode and not all backends were down

2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:4370

2025-12-03 14:20:49.038: main pid 1085028: LOG:
find_primary_node_repeatedly: waiting for finding a primary node

2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:2896

2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node: primary
node is 0

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2815

2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node: standby
node is 1

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2821

2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new primary
node: 0

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4660

2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new main
node: 0

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4667

2025-12-03 14:20:49.189: main pid 1085028: LOG:  === Failback done.
reconnect host 10.6.1.200(5432) ===

2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4763

2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker process
received restart request

2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
pool_worker_child.c:182

2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
received in pcp child process

2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:  pcp_child.c:173

2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087 exits
with status 0 in failover()

2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4850

2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child pid
1085089 in failover()

2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4854

2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process: 1085089
started

2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:  pcp_child.c:165

2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process started

2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
pgpool_main.c:905

2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp worker,
pid=1085093 socket=7

2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:  pcp_child.c:327

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with pid:
1085093 exit with SUCCESS.

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:384

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with pid:
1085093 exits with status 0

2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:398

2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
event detected

2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself

2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524

2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
event detected

2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself

2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524

2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
event detected

2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself

2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524

2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
event detected

2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself

2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524

2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
event detected

2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself

2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524

2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
event detected

......over and over and over again.




pcp_node_info output:

10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
2025-12-03 14:04:39

10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
2025-12-03 14:04:39

Logs show:

node status[0]: 1

node status[1]: 2

Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
(up).

*auto_failback behavior:*

   - When a node is detached (pcp_detach_node), it goes to status 3 (down)
   - auto_failback triggers and moves it to status 1 (waiting)
   - Node never transitions from waiting to up

*Key configuration:*

backend_clustering_mode = 'streaming_replication'

backend_hostname0 = '10.6.1.199'

backend_hostname1 = '10.6.1.200'

backend_application_name0 = 'nasdw_users_1'

backend_application_name1 = 'nasdw_users_2'



use_watchdog = on

# 3 watchdog nodes configured



auto_failback = on

auto_failback_interval = 1



sr_check_period = 10

sr_check_user = 'pgpool'

sr_check_database = 'nasdw_users'



health_check_period = 1

health_check_user = 'pgpool'

health_check_database = 'nasdw_users'



failover_when_quorum_exists = on (default)

failover_require_consensus = on (default)
Cheers,
Adam


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Pgpool can't detect database status properly
@ 2025-12-15 23:43  Tatsuo Ishii <[email protected]>
  parent: Adam Blomeke <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Tatsuo Ishii @ 2025-12-15 23:43 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

> I'm resending this as it's been sitting in the moderation queue for a
> while. Possibly because I didn't have a subject line? Anyways, any help
> would be great. Thanks!

I received your email this time.

> I’m setting up a pgpool cluster to replace a single node database in my
> environment. The single node is separate from the cluster at the moment.
> When it’s time to implement the DB I’m going to redo the backup/restore,
> throw an upgrade from pg15->18, and then bring the cluster and take over
> the old IP.
> 
> 
> 
> *Environment:*
> 
>    - pgpool-II version: 4.6.3 (chirikoboshi)
>    - PostgreSQL version: 18
>    - OS: RHEL9
>    - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, 10.6.1.198)
>    + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
> 
> 
> 
> *Issue:*
> 
> I have pgpool configured and I’ve set it up using the scripts and config
> files from a different instance, one which has been running just fine for a
> year and a half or so. The issue I’m experiencing is that when I
> detach/reattach a node, it sits in waiting constantly. It never transitions
> to up.

If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
and issue an SQL command, for example "SELECT 1", does it work? If it
works, it means pgpool works fine.

> I have to manually change the status file to up for it to get to
> agree that it is,

The pgpool status "waiting" means that the backend node has never
revceived any query from pgpool clients yet. You can safely assume
that that pgpool is up and running. Once pgpool receives queries, the
status should be changed from "waiting" to "up".

> and when I try to drop the node it doesn't actually drop
> it. It just goes into waiting again.

Sounds like an effect of auto fail back. because you set:

 auto_failback_interval = 1

pgpool almost immediately brings the pgpool to online.

> I also don’t see any connection
> attempts from the pgpool server to the postgres nodes if I look at postgres
> logs. I've confirmed that it can run the postgres commands from the command
> line. I've tried this both running pgpool as a service and running it
> directly from the command line. No difference in behavior.

Probably there's something wrong in the configuration or trying to
connect to wrong IP and/or port. Please turn on log_client_messages
and log_per_node_statement, then send an SQL command to pgpool, and
examin the pgpool log.

> Here’s the log output:
> 
> 2025-12-03 14:20:49.037: main pid 1085028: LOG:  === Starting fail back.
> reconnect host 10.6.1.200(5432) ===
> 
> 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:4169
> 
> 2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
> (status: 2)
> 
> 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:1524
> 
> 2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart children
> because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
> are in streaming replication mode and not all backends were down
> 
> 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:4370
> 
> 2025-12-03 14:20:49.038: main pid 1085028: LOG:
> find_primary_node_repeatedly: waiting for finding a primary node
> 
> 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:2896
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node: primary
> node is 0
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2815
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node: standby
> node is 1
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2821
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new primary
> node: 0
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4660
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new main
> node: 0
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4667
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:  === Failback done.
> reconnect host 10.6.1.200(5432) ===
> 
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4763
> 
> 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker process
> received restart request
> 
> 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
> pool_worker_child.c:182
> 
> 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
> received in pcp child process
> 
> 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:  pcp_child.c:173
> 
> 2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087 exits
> with status 0 in failover()
> 
> 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4850
> 
> 2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child pid
> 1085089 in failover()
> 
> 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4854
> 
> 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process: 1085089
> started
> 
> 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:  pcp_child.c:165
> 
> 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process started
> 
> 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
> pgpool_main.c:905
> 
> 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp worker,
> pid=1085093 socket=7
> 
> 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:  pcp_child.c:327
> 
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with pid:
> 1085093 exit with SUCCESS.
> 
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:384
> 
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with pid:
> 1085093 exits with status 0
> 
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:398
> 
> 2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
> event detected
> 
> 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself
> 
> 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524
> 
> 2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
> event detected
> 
> 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself
> 
> 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524
> 
> 2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
> event detected
> 
> 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself
> 
> 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524
> 
> 2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
> event detected
> 
> 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself
> 
> 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524
> 
> 2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
> event detected
> 
> 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself
> 
> 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524
> 
> 2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
> event detected
> 
> ......over and over and over again.
> 
> 
> 
> 
> pcp_node_info output:
> 
> 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
> 2025-12-03 14:04:39
> 
> 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
> 2025-12-03 14:04:39
> 
> Logs show:
> 
> node status[0]: 1
> 
> node status[1]: 2
> 
> Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
> (up).

No, this does not show the backend status. Instead, it says 

> node status[0]: 1

This means backend 0 is primary.

> node status[1]: 2

This means backend 1 is standby.

> *auto_failback behavior:*
> 
>    - When a node is detached (pcp_detach_node), it goes to status 3 (down)
>    - auto_failback triggers and moves it to status 1 (waiting)
>    - Node never transitions from waiting to up

Sounds like pgpool has not received queries.

> *Key configuration:*
> 
> backend_clustering_mode = 'streaming_replication'
> 
> backend_hostname0 = '10.6.1.199'
> 
> backend_hostname1 = '10.6.1.200'
> 
> backend_application_name0 = 'nasdw_users_1'
> 
> backend_application_name1 = 'nasdw_users_2'
> 
> 
> 
> use_watchdog = on
> 
> # 3 watchdog nodes configured
> 
> 
> 
> auto_failback = on
> 
> auto_failback_interval = 1
> 
> 
> 
> sr_check_period = 10
> 
> sr_check_user = 'pgpool'
> 
> sr_check_database = 'nasdw_users'
> 
> 
> 
> health_check_period = 1
> 
> health_check_user = 'pgpool'
> 
> health_check_database = 'nasdw_users'
> 
> 
> 
> failover_when_quorum_exists = on (default)
> 
> failover_require_consensus = on (default)
> Cheers,
> Adam


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Pgpool can't detect database status properly
@ 2025-12-16 15:57  Adam Blomeke <[email protected]>
  parent: Tatsuo Ishii <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Adam Blomeke @ 2025-12-16 15:57 UTC (permalink / raw)
  To: Tatsuo Ishii <[email protected]>; +Cc: [email protected]

Thanks for the reply.

Your response makes sense as I'm still setting this cluster up, so it's
just me trying to connect to it.

I'm curious then what the right process is for when I need to pull a node
out of the cluster for maintenance (e.g. patching). I was under the
impression that I should drop the node, do a pg_rewind, manually set it as
a standby if it was the primary, and then add the node back in pgpool. I
guess I can't do that with auto failback turned on?

Cheers,
Adam


On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]> wrote:

> > I'm resending this as it's been sitting in the moderation queue for a
> > while. Possibly because I didn't have a subject line? Anyways, any help
> > would be great. Thanks!
>
> I received your email this time.
>
> > I’m setting up a pgpool cluster to replace a single node database in my
> > environment. The single node is separate from the cluster at the moment.
> > When it’s time to implement the DB I’m going to redo the backup/restore,
> > throw an upgrade from pg15->18, and then bring the cluster and take over
> > the old IP.
> >
> >
> >
> > *Environment:*
> >
> >    - pgpool-II version: 4.6.3 (chirikoboshi)
> >    - PostgreSQL version: 18
> >    - OS: RHEL9
> >    - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197,
> 10.6.1.198)
> >    + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
> >
> >
> >
> > *Issue:*
> >
> > I have pgpool configured and I’ve set it up using the scripts and config
> > files from a different instance, one which has been running just fine
> for a
> > year and a half or so. The issue I’m experiencing is that when I
> > detach/reattach a node, it sits in waiting constantly. It never
> transitions
> > to up.
>
> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
> and issue an SQL command, for example "SELECT 1", does it work? If it
> works, it means pgpool works fine.
>
> > I have to manually change the status file to up for it to get to
> > agree that it is,
>
> The pgpool status "waiting" means that the backend node has never
> revceived any query from pgpool clients yet. You can safely assume
> that that pgpool is up and running. Once pgpool receives queries, the
> status should be changed from "waiting" to "up".
>
> > and when I try to drop the node it doesn't actually drop
> > it. It just goes into waiting again.
>
> Sounds like an effect of auto fail back. because you set:
>
>  auto_failback_interval = 1
>
> pgpool almost immediately brings the pgpool to online.
>
> > I also don’t see any connection
> > attempts from the pgpool server to the postgres nodes if I look at
> postgres
> > logs. I've confirmed that it can run the postgres commands from the
> command
> > line. I've tried this both running pgpool as a service and running it
> > directly from the command line. No difference in behavior.
>
> Probably there's something wrong in the configuration or trying to
> connect to wrong IP and/or port. Please turn on log_client_messages
> and log_per_node_statement, then send an SQL command to pgpool, and
> examin the pgpool log.
>
> > Here’s the log output:
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  === Starting fail back.
> > reconnect host 10.6.1.200(5432) ===
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:4169
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
> > (status: 2)
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:1524
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart children
> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
> > are in streaming replication mode and not all backends were down
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:4370
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:
> > find_primary_node_repeatedly: waiting for finding a primary node
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:2896
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
> primary
> > node is 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2815
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
> standby
> > node is 1
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2821
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new
> primary
> > node: 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4660
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new main
> > node: 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4667
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  === Failback done.
> > reconnect host 10.6.1.200(5432) ===
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4763
> >
> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker
> process
> > received restart request
> >
> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
> > pool_worker_child.c:182
> >
> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
> > received in pcp child process
> >
> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:  pcp_child.c:173
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087 exits
> > with status 0 in failover()
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4850
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child pid
> > 1085089 in failover()
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4854
> >
> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process: 1085089
> > started
> >
> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:  pcp_child.c:165
> >
> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process
> started
> >
> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
> > pgpool_main.c:905
> >
> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp
> worker,
> > pid=1085093 socket=7
> >
> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:  pcp_child.c:327
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
> pid:
> > 1085093 exit with SUCCESS.
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:384
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
> pid:
> > 1085093 exits with status 0
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:398
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
> > event detected
> >
> > ......over and over and over again.
> >
> >
> >
> >
> > pcp_node_info output:
> >
> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
> > 2025-12-03 14:04:39
> >
> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
> > 2025-12-03 14:04:39
> >
> > Logs show:
> >
> > node status[0]: 1
> >
> > node status[1]: 2
> >
> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
> > (up).
>
> No, this does not show the backend status. Instead, it says
>
> > node status[0]: 1
>
> This means backend 0 is primary.
>
> > node status[1]: 2
>
> This means backend 1 is standby.
>
> > *auto_failback behavior:*
> >
> >    - When a node is detached (pcp_detach_node), it goes to status 3
> (down)
> >    - auto_failback triggers and moves it to status 1 (waiting)
> >    - Node never transitions from waiting to up
>
> Sounds like pgpool has not received queries.
>
> > *Key configuration:*
> >
> > backend_clustering_mode = 'streaming_replication'
> >
> > backend_hostname0 = '10.6.1.199'
> >
> > backend_hostname1 = '10.6.1.200'
> >
> > backend_application_name0 = 'nasdw_users_1'
> >
> > backend_application_name1 = 'nasdw_users_2'
> >
> >
> >
> > use_watchdog = on
> >
> > # 3 watchdog nodes configured
> >
> >
> >
> > auto_failback = on
> >
> > auto_failback_interval = 1
> >
> >
> >
> > sr_check_period = 10
> >
> > sr_check_user = 'pgpool'
> >
> > sr_check_database = 'nasdw_users'
> >
> >
> >
> > health_check_period = 1
> >
> > health_check_user = 'pgpool'
> >
> > health_check_database = 'nasdw_users'
> >
> >
> >
> > failover_when_quorum_exists = on (default)
> >
> > failover_require_consensus = on (default)
> > Cheers,
> > Adam
>


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Pgpool can't detect database status properly
@ 2025-12-17 01:29  Tatsuo Ishii <[email protected]>
  parent: Adam Blomeke <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Tatsuo Ishii @ 2025-12-17 01:29 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

> Thanks for the reply.
> 
> Your response makes sense as I'm still setting this cluster up, so it's
> just me trying to connect to it.
> 
> I'm curious then what the right process is for when I need to pull a node
> out of the cluster for maintenance (e.g. patching). I was under the
> impression that I should drop the node, do a pg_rewind, manually set it as
> a standby if it was the primary, and then add the node back in pgpool. I
> guess I can't do that with auto failback turned on?

Yes, while the maintenance, you should turn off auto failback.
In fact, it's written in the document:
https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTIME-CONFIG-FAILOVER-SETTINGS

    If you plan to detach standby node for maintenance, set this
    parameter to off beforehand. Otherwise it's possible that standby
    node is reattached against your intention.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Cheers,
> Adam
> 
> 
> On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]> wrote:
> 
>> > I'm resending this as it's been sitting in the moderation queue for a
>> > while. Possibly because I didn't have a subject line? Anyways, any help
>> > would be great. Thanks!
>>
>> I received your email this time.
>>
>> > I’m setting up a pgpool cluster to replace a single node database in my
>> > environment. The single node is separate from the cluster at the moment.
>> > When it’s time to implement the DB I’m going to redo the backup/restore,
>> > throw an upgrade from pg15->18, and then bring the cluster and take over
>> > the old IP.
>> >
>> >
>> >
>> > *Environment:*
>> >
>> >    - pgpool-II version: 4.6.3 (chirikoboshi)
>> >    - PostgreSQL version: 18
>> >    - OS: RHEL9
>> >    - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197,
>> 10.6.1.198)
>> >    + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
>> >
>> >
>> >
>> > *Issue:*
>> >
>> > I have pgpool configured and I’ve set it up using the scripts and config
>> > files from a different instance, one which has been running just fine
>> for a
>> > year and a half or so. The issue I’m experiencing is that when I
>> > detach/reattach a node, it sits in waiting constantly. It never
>> transitions
>> > to up.
>>
>> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
>> and issue an SQL command, for example "SELECT 1", does it work? If it
>> works, it means pgpool works fine.
>>
>> > I have to manually change the status file to up for it to get to
>> > agree that it is,
>>
>> The pgpool status "waiting" means that the backend node has never
>> revceived any query from pgpool clients yet. You can safely assume
>> that that pgpool is up and running. Once pgpool receives queries, the
>> status should be changed from "waiting" to "up".
>>
>> > and when I try to drop the node it doesn't actually drop
>> > it. It just goes into waiting again.
>>
>> Sounds like an effect of auto fail back. because you set:
>>
>>  auto_failback_interval = 1
>>
>> pgpool almost immediately brings the pgpool to online.
>>
>> > I also don’t see any connection
>> > attempts from the pgpool server to the postgres nodes if I look at
>> postgres
>> > logs. I've confirmed that it can run the postgres commands from the
>> command
>> > line. I've tried this both running pgpool as a service and running it
>> > directly from the command line. No difference in behavior.
>>
>> Probably there's something wrong in the configuration or trying to
>> connect to wrong IP and/or port. Please turn on log_client_messages
>> and log_per_node_statement, then send an SQL command to pgpool, and
>> examin the pgpool log.
>>
>> > Here’s the log output:
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  === Starting fail back.
>> > reconnect host 10.6.1.200(5432) ===
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:4169
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
>> > (status: 2)
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:1524
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart children
>> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
>> > are in streaming replication mode and not all backends were down
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:4370
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:
>> > find_primary_node_repeatedly: waiting for finding a primary node
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:2896
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
>> primary
>> > node is 0
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2815
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
>> standby
>> > node is 1
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:2821
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new
>> primary
>> > node: 0
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4660
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new main
>> > node: 0
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4667
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  === Failback done.
>> > reconnect host 10.6.1.200(5432) ===
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:4763
>> >
>> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker
>> process
>> > received restart request
>> >
>> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
>> > pool_worker_child.c:182
>> >
>> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
>> > received in pcp child process
>> >
>> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:  pcp_child.c:173
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087 exits
>> > with status 0 in failover()
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4850
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child pid
>> > 1085089 in failover()
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:4854
>> >
>> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process: 1085089
>> > started
>> >
>> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:  pcp_child.c:165
>> >
>> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process
>> started
>> >
>> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
>> > pgpool_main.c:905
>> >
>> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp
>> worker,
>> > pid=1085093 socket=7
>> >
>> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:  pcp_child.c:327
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
>> pid:
>> > 1085093 exit with SUCCESS.
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:384
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
>> pid:
>> > 1085093 exits with status 0
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:398
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
>> > event detected
>> >
>> > ......over and over and over again.
>> >
>> >
>> >
>> >
>> > pcp_node_info output:
>> >
>> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
>> > 2025-12-03 14:04:39
>> >
>> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
>> > 2025-12-03 14:04:39
>> >
>> > Logs show:
>> >
>> > node status[0]: 1
>> >
>> > node status[1]: 2
>> >
>> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
>> > (up).
>>
>> No, this does not show the backend status. Instead, it says
>>
>> > node status[0]: 1
>>
>> This means backend 0 is primary.
>>
>> > node status[1]: 2
>>
>> This means backend 1 is standby.
>>
>> > *auto_failback behavior:*
>> >
>> >    - When a node is detached (pcp_detach_node), it goes to status 3
>> (down)
>> >    - auto_failback triggers and moves it to status 1 (waiting)
>> >    - Node never transitions from waiting to up
>>
>> Sounds like pgpool has not received queries.
>>
>> > *Key configuration:*
>> >
>> > backend_clustering_mode = 'streaming_replication'
>> >
>> > backend_hostname0 = '10.6.1.199'
>> >
>> > backend_hostname1 = '10.6.1.200'
>> >
>> > backend_application_name0 = 'nasdw_users_1'
>> >
>> > backend_application_name1 = 'nasdw_users_2'
>> >
>> >
>> >
>> > use_watchdog = on
>> >
>> > # 3 watchdog nodes configured
>> >
>> >
>> >
>> > auto_failback = on
>> >
>> > auto_failback_interval = 1
>> >
>> >
>> >
>> > sr_check_period = 10
>> >
>> > sr_check_user = 'pgpool'
>> >
>> > sr_check_database = 'nasdw_users'
>> >
>> >
>> >
>> > health_check_period = 1
>> >
>> > health_check_user = 'pgpool'
>> >
>> > health_check_database = 'nasdw_users'
>> >
>> >
>> >
>> > failover_when_quorum_exists = on (default)
>> >
>> > failover_require_consensus = on (default)
>> > Cheers,
>> > Adam
>>


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Pgpool can't detect database status properly
@ 2025-12-30 21:59  Adam Blomeke <[email protected]>
  parent: Tatsuo Ishii <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Adam Blomeke @ 2025-12-30 21:59 UTC (permalink / raw)
  To: Tatsuo Ishii <[email protected]>; +Cc: [email protected]

Follow up on this. I've got autofailback set to off, but it's executing the
follow primary script on the node as soon as I attempt to detach it. Is
this expected behavior?

[postgres@awaprodxtrldbpgpool1 ~]$ grep auto_failback
/etc/pgpool-II/pgpool.conf
auto_failback = off
auto_failback_interval = 1
                                   # Min interval of executing
auto_failback in
[postgres@awaprodxtrldbpgpool1 ~]$ # Checkpoint on primary
psql -h 10.6.1.199 -d postgres -c "CHECKPOINT;"
# Detach primary to trigger failover
pcp_detach_node -h 10.6.1.54 -U pgpool -n 0
# Verify node 1 is now primary
pcp_node_info -a -U pgpool -h 10.6.1.54
CHECKPOINT
pcp_detach_node -- Command Successful
10.6.1.199 5432 3 0.500000 down up primary primary 0 none none 2025-12-30
14:22:10
10.6.1.200 5432 2 0.500000 up up standby standby 0 none none 2025-12-30
12:04:09
[postgres@awaprodxtrldbpgpool1 ~]$ pcp_node_info -a -U pgpool -h 10.6.1.54
10.6.1.199 5432 3 0.500000 down down standby unknown 0 none none 2025-12-30
14:22:11
10.6.1.200 5432 2 0.500000 up up primary primary 0 none none 2025-12-30
14:22:11
[postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected]
"/usr/pgsql-18/bin/pg_ctl -D /opt/data/data18 stop"
Authorized uses only. All activity may be monitored and reported.
pg_ctl: PID file "/opt/data/data18/postmaster.pid" does not exist
Is server running?
[postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected]
Authorized uses only. All activity may be monitored and reported.
Last login: Tue Dec 30 11:45:09 2025 from 10.6.1.196
-bash: typeset: TMOUT: readonly variable
[postgres@awaproddbvmnasdwusers1 ~]$ cd /opt/data
[postgres@awaproddbvmnasdwusers1 data]$ ll
total 1236
drwxr-x---.  2 postgres postgres       6 Dec 30 14:22 archive18
drwx------. 23 postgres postgres    4096 Dec 29 12:46 data15
drwx------. 13 postgres postgres    4096 Dec 30 14:22 data18
drwx------.  2 postgres postgres      74 Dec  1 15:50 dbservercert
-rw-r-----.  1 postgres postgres 1255731 Oct 24 18:04 pg_basebackup.log
[postgres@awaproddbvmnasdwusers1 data]$ cd data18/
[postgres@awaproddbvmnasdwusers1 data18]$ ll
total 8
-rw-------. 1 postgres postgres  233 Dec 30 14:22 backup_label
drwx------. 6 postgres postgres   66 Dec 30 14:22 base
drwx------. 2 postgres postgres 4096 Dec 30 14:22 global
drwx------. 2 postgres postgres   26 Dec 30 14:22 pg_commit_ts
drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_dynshmem
drwx------. 4 postgres postgres   48 Dec 30 14:22 pg_multixact
drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_notify
drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_serial
drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_snapshots
drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_subtrans
drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_twophase
drwx------. 4 postgres postgres  121 Dec 30 14:22 pg_wal
[postgres@awaproddbvmnasdwusers1 data18]$

Cheers,
Adam


On Tue, Dec 16, 2025 at 5:29 PM Tatsuo Ishii <[email protected]> wrote:

> > Thanks for the reply.
> >
> > Your response makes sense as I'm still setting this cluster up, so it's
> > just me trying to connect to it.
> >
> > I'm curious then what the right process is for when I need to pull a node
> > out of the cluster for maintenance (e.g. patching). I was under the
> > impression that I should drop the node, do a pg_rewind, manually set it
> as
> > a standby if it was the primary, and then add the node back in pgpool. I
> > guess I can't do that with auto failback turned on?
>
> Yes, while the maintenance, you should turn off auto failback.
> In fact, it's written in the document:
>
> https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTIME-CONFIG-FAILOVER-SETTINGS
>
>     If you plan to detach standby node for maintenance, set this
>     parameter to off beforehand. Otherwise it's possible that standby
>     node is reattached against your intention.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
> > Cheers,
> > Adam
> >
> >
> > On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]>
> wrote:
> >
> >> > I'm resending this as it's been sitting in the moderation queue for a
> >> > while. Possibly because I didn't have a subject line? Anyways, any
> help
> >> > would be great. Thanks!
> >>
> >> I received your email this time.
> >>
> >> > I’m setting up a pgpool cluster to replace a single node database in
> my
> >> > environment. The single node is separate from the cluster at the
> moment.
> >> > When it’s time to implement the DB I’m going to redo the
> backup/restore,
> >> > throw an upgrade from pg15->18, and then bring the cluster and take
> over
> >> > the old IP.
> >> >
> >> >
> >> >
> >> > *Environment:*
> >> >
> >> >    - pgpool-II version: 4.6.3 (chirikoboshi)
> >> >    - PostgreSQL version: 18
> >> >    - OS: RHEL9
> >> >    - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197,
> >> 10.6.1.198)
> >> >    + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
> >> >
> >> >
> >> >
> >> > *Issue:*
> >> >
> >> > I have pgpool configured and I’ve set it up using the scripts and
> config
> >> > files from a different instance, one which has been running just fine
> >> for a
> >> > year and a half or so. The issue I’m experiencing is that when I
> >> > detach/reattach a node, it sits in waiting constantly. It never
> >> transitions
> >> > to up.
> >>
> >> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
> >> and issue an SQL command, for example "SELECT 1", does it work? If it
> >> works, it means pgpool works fine.
> >>
> >> > I have to manually change the status file to up for it to get to
> >> > agree that it is,
> >>
> >> The pgpool status "waiting" means that the backend node has never
> >> revceived any query from pgpool clients yet. You can safely assume
> >> that that pgpool is up and running. Once pgpool receives queries, the
> >> status should be changed from "waiting" to "up".
> >>
> >> > and when I try to drop the node it doesn't actually drop
> >> > it. It just goes into waiting again.
> >>
> >> Sounds like an effect of auto fail back. because you set:
> >>
> >>  auto_failback_interval = 1
> >>
> >> pgpool almost immediately brings the pgpool to online.
> >>
> >> > I also don’t see any connection
> >> > attempts from the pgpool server to the postgres nodes if I look at
> >> postgres
> >> > logs. I've confirmed that it can run the postgres commands from the
> >> command
> >> > line. I've tried this both running pgpool as a service and running it
> >> > directly from the command line. No difference in behavior.
> >>
> >> Probably there's something wrong in the configuration or trying to
> >> connect to wrong IP and/or port. Please turn on log_client_messages
> >> and log_per_node_statement, then send an SQL command to pgpool, and
> >> examin the pgpool log.
> >>
> >> > Here’s the log output:
> >> >
> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  === Starting fail
> back.
> >> > reconnect host 10.6.1.200(5432) ===
> >> >
> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:
> pgpool_main.c:4169
> >> >
> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
> >> > (status: 2)
> >> >
> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:
> pgpool_main.c:1524
> >> >
> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart
> children
> >> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and
> we
> >> > are in streaming replication mode and not all backends were down
> >> >
> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:
> pgpool_main.c:4370
> >> >
> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:
> >> > find_primary_node_repeatedly: waiting for finding a primary node
> >> >
> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:
> pgpool_main.c:2896
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
> >> primary
> >> > node is 0
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
> pgpool_main.c:2815
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
> >> standby
> >> > node is 1
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
> pgpool_main.c:2821
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new
> >> primary
> >> > node: 0
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
> pgpool_main.c:4660
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new
> main
> >> > node: 0
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
> pgpool_main.c:4667
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  === Failback done.
> >> > reconnect host 10.6.1.200(5432) ===
> >> >
> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
> pgpool_main.c:4763
> >> >
> >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker
> >> process
> >> > received restart request
> >> >
> >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
> >> > pool_worker_child.c:182
> >> >
> >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
> >> > received in pcp child process
> >> >
> >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:
> pcp_child.c:173
> >> >
> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087
> exits
> >> > with status 0 in failover()
> >> >
> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:
> pgpool_main.c:4850
> >> >
> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child
> pid
> >> > 1085089 in failover()
> >> >
> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:
> pgpool_main.c:4854
> >> >
> >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process:
> 1085089
> >> > started
> >> >
> >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:
> pcp_child.c:165
> >> >
> >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process
> >> started
> >> >
> >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
> >> > pgpool_main.c:905
> >> >
> >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp
> >> worker,
> >> > pid=1085093 socket=7
> >> >
> >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:
> pcp_child.c:327
> >> >
> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
> >> pid:
> >> > 1085093 exit with SUCCESS.
> >> >
> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:
> pcp_child.c:384
> >> >
> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
> >> pid:
> >> > 1085093 exits with status 0
> >> >
> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:
> pcp_child.c:398
> >> >
> >> > 2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
> >> > event detected
> >> >
> >> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself
> >> >
> >> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524
> >> >
> >> > 2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
> >> > event detected
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
> >> > event detected
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
> >> > event detected
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself
> >> >
> >> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524
> >> >
> >> > 2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
> >> > event detected
> >> >
> >> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself
> >> >
> >> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524
> >> >
> >> > 2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
> >> > event detected
> >> >
> >> > ......over and over and over again.
> >> >
> >> >
> >> >
> >> >
> >> > pcp_node_info output:
> >> >
> >> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
> >> > 2025-12-03 14:04:39
> >> >
> >> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming
> async
> >> > 2025-12-03 14:04:39
> >> >
> >> > Logs show:
> >> >
> >> > node status[0]: 1
> >> >
> >> > node status[1]: 2
> >> >
> >> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets
> status 2
> >> > (up).
> >>
> >> No, this does not show the backend status. Instead, it says
> >>
> >> > node status[0]: 1
> >>
> >> This means backend 0 is primary.
> >>
> >> > node status[1]: 2
> >>
> >> This means backend 1 is standby.
> >>
> >> > *auto_failback behavior:*
> >> >
> >> >    - When a node is detached (pcp_detach_node), it goes to status 3
> >> (down)
> >> >    - auto_failback triggers and moves it to status 1 (waiting)
> >> >    - Node never transitions from waiting to up
> >>
> >> Sounds like pgpool has not received queries.
> >>
> >> > *Key configuration:*
> >> >
> >> > backend_clustering_mode = 'streaming_replication'
> >> >
> >> > backend_hostname0 = '10.6.1.199'
> >> >
> >> > backend_hostname1 = '10.6.1.200'
> >> >
> >> > backend_application_name0 = 'nasdw_users_1'
> >> >
> >> > backend_application_name1 = 'nasdw_users_2'
> >> >
> >> >
> >> >
> >> > use_watchdog = on
> >> >
> >> > # 3 watchdog nodes configured
> >> >
> >> >
> >> >
> >> > auto_failback = on
> >> >
> >> > auto_failback_interval = 1
> >> >
> >> >
> >> >
> >> > sr_check_period = 10
> >> >
> >> > sr_check_user = 'pgpool'
> >> >
> >> > sr_check_database = 'nasdw_users'
> >> >
> >> >
> >> >
> >> > health_check_period = 1
> >> >
> >> > health_check_user = 'pgpool'
> >> >
> >> > health_check_database = 'nasdw_users'
> >> >
> >> >
> >> >
> >> > failover_when_quorum_exists = on (default)
> >> >
> >> > failover_require_consensus = on (default)
> >> > Cheers,
> >> > Adam
> >>
>


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Pgpool can't detect database status properly
@ 2025-12-31 00:34  Tatsuo Ishii <[email protected]>
  parent: Adam Blomeke <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tatsuo Ishii @ 2025-12-31 00:34 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

If you want the follow primary script to not execute, you can
temporarily turn off (set an empty string to "follow_primary_command")
and reload pgpool.conf before running pcp_detatch_node.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Follow up on this. I've got autofailback set to off, but it's executing the
> follow primary script on the node as soon as I attempt to detach it. Is
> this expected behavior?
> 
> [postgres@awaprodxtrldbpgpool1 ~]$ grep auto_failback
> /etc/pgpool-II/pgpool.conf
> auto_failback = off
> auto_failback_interval = 1
>                                    # Min interval of executing
> auto_failback in
> [postgres@awaprodxtrldbpgpool1 ~]$ # Checkpoint on primary
> psql -h 10.6.1.199 -d postgres -c "CHECKPOINT;"
> # Detach primary to trigger failover
> pcp_detach_node -h 10.6.1.54 -U pgpool -n 0
> # Verify node 1 is now primary
> pcp_node_info -a -U pgpool -h 10.6.1.54
> CHECKPOINT
> pcp_detach_node -- Command Successful
> 10.6.1.199 5432 3 0.500000 down up primary primary 0 none none 2025-12-30
> 14:22:10
> 10.6.1.200 5432 2 0.500000 up up standby standby 0 none none 2025-12-30
> 12:04:09
> [postgres@awaprodxtrldbpgpool1 ~]$ pcp_node_info -a -U pgpool -h 10.6.1.54
> 10.6.1.199 5432 3 0.500000 down down standby unknown 0 none none 2025-12-30
> 14:22:11
> 10.6.1.200 5432 2 0.500000 up up primary primary 0 none none 2025-12-30
> 14:22:11
> [postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected]
> "/usr/pgsql-18/bin/pg_ctl -D /opt/data/data18 stop"
> Authorized uses only. All activity may be monitored and reported.
> pg_ctl: PID file "/opt/data/data18/postmaster.pid" does not exist
> Is server running?
> [postgres@awaprodxtrldbpgpool1 ~]$ ssh [email protected]
> Authorized uses only. All activity may be monitored and reported.
> Last login: Tue Dec 30 11:45:09 2025 from 10.6.1.196
> -bash: typeset: TMOUT: readonly variable
> [postgres@awaproddbvmnasdwusers1 ~]$ cd /opt/data
> [postgres@awaproddbvmnasdwusers1 data]$ ll
> total 1236
> drwxr-x---.  2 postgres postgres       6 Dec 30 14:22 archive18
> drwx------. 23 postgres postgres    4096 Dec 29 12:46 data15
> drwx------. 13 postgres postgres    4096 Dec 30 14:22 data18
> drwx------.  2 postgres postgres      74 Dec  1 15:50 dbservercert
> -rw-r-----.  1 postgres postgres 1255731 Oct 24 18:04 pg_basebackup.log
> [postgres@awaproddbvmnasdwusers1 data]$ cd data18/
> [postgres@awaproddbvmnasdwusers1 data18]$ ll
> total 8
> -rw-------. 1 postgres postgres  233 Dec 30 14:22 backup_label
> drwx------. 6 postgres postgres   66 Dec 30 14:22 base
> drwx------. 2 postgres postgres 4096 Dec 30 14:22 global
> drwx------. 2 postgres postgres   26 Dec 30 14:22 pg_commit_ts
> drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_dynshmem
> drwx------. 4 postgres postgres   48 Dec 30 14:22 pg_multixact
> drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_notify
> drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_serial
> drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_snapshots
> drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_subtrans
> drwx------. 2 postgres postgres   10 Dec 30 14:22 pg_twophase
> drwx------. 4 postgres postgres  121 Dec 30 14:22 pg_wal
> [postgres@awaproddbvmnasdwusers1 data18]$
> 
> Cheers,
> Adam
> 
> 
> On Tue, Dec 16, 2025 at 5:29 PM Tatsuo Ishii <[email protected]> wrote:
> 
>> > Thanks for the reply.
>> >
>> > Your response makes sense as I'm still setting this cluster up, so it's
>> > just me trying to connect to it.
>> >
>> > I'm curious then what the right process is for when I need to pull a node
>> > out of the cluster for maintenance (e.g. patching). I was under the
>> > impression that I should drop the node, do a pg_rewind, manually set it
>> as
>> > a standby if it was the primary, and then add the node back in pgpool. I
>> > guess I can't do that with auto failback turned on?
>>
>> Yes, while the maintenance, you should turn off auto failback.
>> In fact, it's written in the document:
>>
>> https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTIME-CONFIG-FAILOVER-SETTINGS
>>
>>     If you plan to detach standby node for maintenance, set this
>>     parameter to off beforehand. Otherwise it's possible that standby
>>     node is reattached against your intention.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>
>> > Cheers,
>> > Adam
>> >
>> >
>> > On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <[email protected]>
>> wrote:
>> >
>> >> > I'm resending this as it's been sitting in the moderation queue for a
>> >> > while. Possibly because I didn't have a subject line? Anyways, any
>> help
>> >> > would be great. Thanks!
>> >>
>> >> I received your email this time.
>> >>
>> >> > I’m setting up a pgpool cluster to replace a single node database in
>> my
>> >> > environment. The single node is separate from the cluster at the
>> moment.
>> >> > When it’s time to implement the DB I’m going to redo the
>> backup/restore,
>> >> > throw an upgrade from pg15->18, and then bring the cluster and take
>> over
>> >> > the old IP.
>> >> >
>> >> >
>> >> >
>> >> > *Environment:*
>> >> >
>> >> >    - pgpool-II version: 4.6.3 (chirikoboshi)
>> >> >    - PostgreSQL version: 18
>> >> >    - OS: RHEL9
>> >> >    - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197,
>> >> 10.6.1.198)
>> >> >    + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
>> >> >
>> >> >
>> >> >
>> >> > *Issue:*
>> >> >
>> >> > I have pgpool configured and I’ve set it up using the scripts and
>> config
>> >> > files from a different instance, one which has been running just fine
>> >> for a
>> >> > year and a half or so. The issue I’m experiencing is that when I
>> >> > detach/reattach a node, it sits in waiting constantly. It never
>> >> transitions
>> >> > to up.
>> >>
>> >> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
>> >> and issue an SQL command, for example "SELECT 1", does it work? If it
>> >> works, it means pgpool works fine.
>> >>
>> >> > I have to manually change the status file to up for it to get to
>> >> > agree that it is,
>> >>
>> >> The pgpool status "waiting" means that the backend node has never
>> >> revceived any query from pgpool clients yet. You can safely assume
>> >> that that pgpool is up and running. Once pgpool receives queries, the
>> >> status should be changed from "waiting" to "up".
>> >>
>> >> > and when I try to drop the node it doesn't actually drop
>> >> > it. It just goes into waiting again.
>> >>
>> >> Sounds like an effect of auto fail back. because you set:
>> >>
>> >>  auto_failback_interval = 1
>> >>
>> >> pgpool almost immediately brings the pgpool to online.
>> >>
>> >> > I also don’t see any connection
>> >> > attempts from the pgpool server to the postgres nodes if I look at
>> >> postgres
>> >> > logs. I've confirmed that it can run the postgres commands from the
>> >> command
>> >> > line. I've tried this both running pgpool as a service and running it
>> >> > directly from the command line. No difference in behavior.
>> >>
>> >> Probably there's something wrong in the configuration or trying to
>> >> connect to wrong IP and/or port. Please turn on log_client_messages
>> >> and log_per_node_statement, then send an SQL command to pgpool, and
>> >> examin the pgpool log.
>> >>
>> >> > Here’s the log output:
>> >> >
>> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  === Starting fail
>> back.
>> >> > reconnect host 10.6.1.200(5432) ===
>> >> >
>> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:
>> pgpool_main.c:4169
>> >> >
>> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
>> >> > (status: 2)
>> >> >
>> >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:
>> pgpool_main.c:1524
>> >> >
>> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart
>> children
>> >> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and
>> we
>> >> > are in streaming replication mode and not all backends were down
>> >> >
>> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:
>> pgpool_main.c:4370
>> >> >
>> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:
>> >> > find_primary_node_repeatedly: waiting for finding a primary node
>> >> >
>> >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:
>> pgpool_main.c:2896
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
>> >> primary
>> >> > node is 0
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
>> pgpool_main.c:2815
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
>> >> standby
>> >> > node is 1
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
>> pgpool_main.c:2821
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new
>> >> primary
>> >> > node: 0
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
>> pgpool_main.c:4660
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new
>> main
>> >> > node: 0
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
>> pgpool_main.c:4667
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  === Failback done.
>> >> > reconnect host 10.6.1.200(5432) ===
>> >> >
>> >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:
>> pgpool_main.c:4763
>> >> >
>> >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker
>> >> process
>> >> > received restart request
>> >> >
>> >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
>> >> > pool_worker_child.c:182
>> >> >
>> >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
>> >> > received in pcp child process
>> >> >
>> >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:
>> pcp_child.c:173
>> >> >
>> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087
>> exits
>> >> > with status 0 in failover()
>> >> >
>> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:
>> pgpool_main.c:4850
>> >> >
>> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child
>> pid
>> >> > 1085089 in failover()
>> >> >
>> >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:
>> pgpool_main.c:4854
>> >> >
>> >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process:
>> 1085089
>> >> > started
>> >> >
>> >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:
>> pcp_child.c:165
>> >> >
>> >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process
>> >> started
>> >> >
>> >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
>> >> > pgpool_main.c:905
>> >> >
>> >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp
>> >> worker,
>> >> > pid=1085093 socket=7
>> >> >
>> >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:
>> pcp_child.c:327
>> >> >
>> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
>> >> pid:
>> >> > 1085093 exit with SUCCESS.
>> >> >
>> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:
>> pcp_child.c:384
>> >> >
>> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
>> >> pid:
>> >> > 1085093 exits with status 0
>> >> >
>> >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:
>> pcp_child.c:398
>> >> >
>> >> > 2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
>> >> > event detected
>> >> >
>> >> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself
>> >> >
>> >> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524
>> >> >
>> >> > 2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
>> >> > event detected
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
>> >> > event detected
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
>> >> > event detected
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself
>> >> >
>> >> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524
>> >> >
>> >> > 2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
>> >> > event detected
>> >> >
>> >> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself
>> >> >
>> >> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524
>> >> >
>> >> > 2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
>> >> > event detected
>> >> >
>> >> > ......over and over and over again.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > pcp_node_info output:
>> >> >
>> >> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
>> >> > 2025-12-03 14:04:39
>> >> >
>> >> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming
>> async
>> >> > 2025-12-03 14:04:39
>> >> >
>> >> > Logs show:
>> >> >
>> >> > node status[0]: 1
>> >> >
>> >> > node status[1]: 2
>> >> >
>> >> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets
>> status 2
>> >> > (up).
>> >>
>> >> No, this does not show the backend status. Instead, it says
>> >>
>> >> > node status[0]: 1
>> >>
>> >> This means backend 0 is primary.
>> >>
>> >> > node status[1]: 2
>> >>
>> >> This means backend 1 is standby.
>> >>
>> >> > *auto_failback behavior:*
>> >> >
>> >> >    - When a node is detached (pcp_detach_node), it goes to status 3
>> >> (down)
>> >> >    - auto_failback triggers and moves it to status 1 (waiting)
>> >> >    - Node never transitions from waiting to up
>> >>
>> >> Sounds like pgpool has not received queries.
>> >>
>> >> > *Key configuration:*
>> >> >
>> >> > backend_clustering_mode = 'streaming_replication'
>> >> >
>> >> > backend_hostname0 = '10.6.1.199'
>> >> >
>> >> > backend_hostname1 = '10.6.1.200'
>> >> >
>> >> > backend_application_name0 = 'nasdw_users_1'
>> >> >
>> >> > backend_application_name1 = 'nasdw_users_2'
>> >> >
>> >> >
>> >> >
>> >> > use_watchdog = on
>> >> >
>> >> > # 3 watchdog nodes configured
>> >> >
>> >> >
>> >> >
>> >> > auto_failback = on
>> >> >
>> >> > auto_failback_interval = 1
>> >> >
>> >> >
>> >> >
>> >> > sr_check_period = 10
>> >> >
>> >> > sr_check_user = 'pgpool'
>> >> >
>> >> > sr_check_database = 'nasdw_users'
>> >> >
>> >> >
>> >> >
>> >> > health_check_period = 1
>> >> >
>> >> > health_check_user = 'pgpool'
>> >> >
>> >> > health_check_database = 'nasdw_users'
>> >> >
>> >> >
>> >> >
>> >> > failover_when_quorum_exists = on (default)
>> >> >
>> >> > failover_require_consensus = on (default)
>> >> > Cheers,
>> >> > Adam
>> >>
>>


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

end of thread, other threads:[~2025-12-31 00:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-12-12 21:25 Pgpool can't detect database status properly Adam Blomeke <[email protected]>
2025-12-15 23:43 ` Tatsuo Ishii <[email protected]>
2025-12-16 15:57   ` Adam Blomeke <[email protected]>
2025-12-17 01:29     ` Tatsuo Ishii <[email protected]>
2025-12-30 21:59       ` Adam Blomeke <[email protected]>
2025-12-31 00:34         ` Tatsuo Ishii <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox