MIME-Version: 1.0
References: 
 <CAG9Amsj61Oy7YGJw40uv7fedZwabLu2b5dVmJK4aDrZLDMVj+w@mail.gmail.com>
 <20251216.084325.87549965646144515.ishii@postgresql.org>
In-Reply-To: <20251216.084325.87549965646144515.ishii@postgresql.org>
From: Adam Blomeke <adam.blomeke@gmail.com>
Date: Tue, 16 Dec 2025 10:57:16 -0500
Message-ID: 
 <CAG9AmsigfRELb2B1K17vJ7YB_5xw0eVXY9AUmH3vj74RcLY08w@mail.gmail.com>
Subject: Re: Pgpool can't detect database status properly
To: Tatsuo Ishii <ishii@postgresql.org>
Cc: pgpool-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="000000000000dcaad6064613c8be"
Archived-At: 
 <https://www.postgresql.org/message-id/CAG9AmsigfRELb2B1K17vJ7YB_5xw0eVXY9AUmH3vj74RcLY08w%40mail.gmail.com>
Precedence: bulk

--000000000000dcaad6064613c8be
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thanks for the reply.

Your response makes sense as I'm still setting this cluster up, so it's
just me trying to connect to it.

I'm curious then what the right process is for when I need to pull a node
out of the cluster for maintenance (e.g. patching). I was under the
impression that I should drop the node, do a pg_rewind, manually set it as
a standby if it was the primary, and then add the node back in pgpool. I
guess I can't do that with auto failback turned on?

Cheers,
Adam


On Mon, Dec 15, 2025 at 6:43=E2=80=AFPM Tatsuo Ishii <ishii@postgresql.org>=
 wrote:

> > I'm resending this as it's been sitting in the moderation queue for a
> > while. Possibly because I didn't have a subject line? Anyways, any help
> > would be great. Thanks!
>
> I received your email this time.
>
> > I=E2=80=99m setting up a pgpool cluster to replace a single node databa=
se in my
> > environment. The single node is separate from the cluster at the moment=
.
> > When it=E2=80=99s time to implement the DB I=E2=80=99m going to redo th=
e backup/restore,
> > throw an upgrade from pg15->18, and then bring the cluster and take ove=
r
> > the old IP.
> >
> >
> >
> > *Environment:*
> >
> >    - pgpool-II version: 4.6.3 (chirikoboshi)
> >    - PostgreSQL version: 18
> >    - OS: RHEL9
> >    - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197,
> 10.6.1.198)
> >    + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
> >
> >
> >
> > *Issue:*
> >
> > I have pgpool configured and I=E2=80=99ve set it up using the scripts a=
nd config
> > files from a different instance, one which has been running just fine
> for a
> > year and a half or so. The issue I=E2=80=99m experiencing is that when =
I
> > detach/reattach a node, it sits in waiting constantly. It never
> transitions
> > to up.
>
> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
> and issue an SQL command, for example "SELECT 1", does it work? If it
> works, it means pgpool works fine.
>
> > I have to manually change the status file to up for it to get to
> > agree that it is,
>
> The pgpool status "waiting" means that the backend node has never
> revceived any query from pgpool clients yet. You can safely assume
> that that pgpool is up and running. Once pgpool receives queries, the
> status should be changed from "waiting" to "up".
>
> > and when I try to drop the node it doesn't actually drop
> > it. It just goes into waiting again.
>
> Sounds like an effect of auto fail back. because you set:
>
>  auto_failback_interval =3D 1
>
> pgpool almost immediately brings the pgpool to online.
>
> > I also don=E2=80=99t see any connection
> > attempts from the pgpool server to the postgres nodes if I look at
> postgres
> > logs. I've confirmed that it can run the postgres commands from the
> command
> > line. I've tried this both running pgpool as a service and running it
> > directly from the command line. No difference in behavior.
>
> Probably there's something wrong in the configuration or trying to
> connect to wrong IP and/or port. Please turn on log_client_messages
> and log_per_node_statement, then send an SQL command to pgpool, and
> examin the pgpool log.
>
> > Here=E2=80=99s the log output:
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  =3D=3D=3D Starting fai=
l back.
> > reconnect host 10.6.1.200(5432) =3D=3D=3D
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:416=
9
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:  Node 0 is not down
> > (status: 2)
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:  pgpool_main.c:152=
4
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:  Do not restart childre=
n
> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and w=
e
> > are in streaming replication mode and not all backends were down
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:437=
0
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:
> > find_primary_node_repeatedly: waiting for finding a primary node
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:  pgpool_main.c:289=
6
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
> primary
> > node is 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:281=
5
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  find_primary_node:
> standby
> > node is 1
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:282=
1
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new
> primary
> > node: 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:466=
0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  failover: set new main
> > node: 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:466=
7
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:  =3D=3D=3D Failback don=
e.
> > reconnect host 10.6.1.200(5432) =3D=3D=3D
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:  pgpool_main.c:476=
3
> >
> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:  worker
> process
> > received restart request
> >
> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
> > pool_worker_child.c:182
> >
> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:  restart request
> > received in pcp child process
> >
> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:  pcp_child.c:1=
73
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  PCP child 1085087 exit=
s
> > with status 0 in failover()
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:485=
0
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:  fork a new PCP child p=
id
> > 1085089 in failover()
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:  pgpool_main.c:485=
4
> >
> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:  PCP process: 10850=
89
> > started
> >
> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:  pcp_child.c:1=
65
> >
> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:  process
> started
> >
> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
> > pgpool_main.c:905
> >
> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:  forked new pcp
> worker,
> > pid=3D1085093 socket=3D7
> >
> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:  pcp_child.c:3=
27
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
> pid:
> > 1085093 exit with SUCCESS.
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:3=
84
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:  PCP process with
> pid:
> > 1085093 exits with status 0
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:  pcp_child.c:3=
98
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.480: child pid 1085038: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: LOG:  failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:  restarting myself
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:  child.c:1524
> >
> > 2025-12-03 14:25:39.483: child pid 1085059: LOG:  failover or failback
> > event detected
> >
> > ......over and over and over again.
> >
> >
> >
> >
> > pcp_node_info output:
> >
> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
> > 2025-12-03 14:04:39
> >
> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
> > 2025-12-03 14:04:39
> >
> > Logs show:
> >
> > node status[0]: 1
> >
> > node status[1]: 2
> >
> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status =
2
> > (up).
>
> No, this does not show the backend status. Instead, it says
>
> > node status[0]: 1
>
> This means backend 0 is primary.
>
> > node status[1]: 2
>
> This means backend 1 is standby.
>
> > *auto_failback behavior:*
> >
> >    - When a node is detached (pcp_detach_node), it goes to status 3
> (down)
> >    - auto_failback triggers and moves it to status 1 (waiting)
> >    - Node never transitions from waiting to up
>
> Sounds like pgpool has not received queries.
>
> > *Key configuration:*
> >
> > backend_clustering_mode =3D 'streaming_replication'
> >
> > backend_hostname0 =3D '10.6.1.199'
> >
> > backend_hostname1 =3D '10.6.1.200'
> >
> > backend_application_name0 =3D 'nasdw_users_1'
> >
> > backend_application_name1 =3D 'nasdw_users_2'
> >
> >
> >
> > use_watchdog =3D on
> >
> > # 3 watchdog nodes configured
> >
> >
> >
> > auto_failback =3D on
> >
> > auto_failback_interval =3D 1
> >
> >
> >
> > sr_check_period =3D 10
> >
> > sr_check_user =3D 'pgpool'
> >
> > sr_check_database =3D 'nasdw_users'
> >
> >
> >
> > health_check_period =3D 1
> >
> > health_check_user =3D 'pgpool'
> >
> > health_check_database =3D 'nasdw_users'
> >
> >
> >
> > failover_when_quorum_exists =3D on (default)
> >
> > failover_require_consensus =3D on (default)
> > Cheers,
> > Adam
>

--000000000000dcaad6064613c8be
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Thanks for the reply.=C2=A0</div><div><br></div><div>=
Your response makes sense as I&#39;m still setting this cluster up, so it&#=
39;s just me trying to connect to it.</div><div><br></div><div>I&#39;m curi=
ous then what the right process is for when I need to pull a node out of th=
e cluster for maintenance (e.g. patching). I was under the impression that =
I should drop the node, do a pg_rewind, manually set it as a standby if it =
was the primary, and then add the node back in pgpool. I guess I can&#39;t =
do that with auto failback turned on?</div><div><br></div><div><div dir=3D"=
ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=
=3D"ltr"><div><div dir=3D"ltr"><div>Cheers,</div>Adam<br></div></div></div>=
</div></div><br></div><br><div class=3D"gmail_quote gmail_quote_container">=
<div dir=3D"ltr" class=3D"gmail_attr">On Mon, Dec 15, 2025 at 6:43=E2=80=AF=
PM Tatsuo Ishii &lt;<a href=3D"mailto:ishii@postgresql.org">ishii@postgresq=
l.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex">&gt; I&#39;m resending this as it&#39;s been sitting in the moderation=
 queue for a<br>
&gt; while. Possibly because I didn&#39;t have a subject line? Anyways, any=
 help<br>
&gt; would be great. Thanks!<br>
<br>
I received your email this time.<br>
<br>
&gt; I=E2=80=99m setting up a pgpool cluster to replace a single node datab=
ase in my<br>
&gt; environment. The single node is separate from the cluster at the momen=
t.<br>
&gt; When it=E2=80=99s time to implement the DB I=E2=80=99m going to redo t=
he backup/restore,<br>
&gt; throw an upgrade from pg15-&gt;18, and then bring the cluster and take=
 over<br>
&gt; the old IP.<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; *Environment:*<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 - pgpool-II version: 4.6.3 (chirikoboshi)<br>
&gt;=C2=A0 =C2=A0 - PostgreSQL version: 18<br>
&gt;=C2=A0 =C2=A0 - OS: RHEL9<br>
&gt;=C2=A0 =C2=A0 - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.19=
7, 10.6.1.198)<br>
&gt;=C2=A0 =C2=A0 + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 stan=
dby)<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; *Issue:*<br>
&gt; <br>
&gt; I have pgpool configured and I=E2=80=99ve set it up using the scripts =
and config<br>
&gt; files from a different instance, one which has been running just fine =
for a<br>
&gt; year and a half or so. The issue I=E2=80=99m experiencing is that when=
 I<br>
&gt; detach/reattach a node, it sits in waiting constantly. It never transi=
tions<br>
&gt; to up.<br>
<br>
If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql<br>
and issue an SQL command, for example &quot;SELECT 1&quot;, does it work? I=
f it<br>
works, it means pgpool works fine.<br>
<br>
&gt; I have to manually change the status file to up for it to get to<br>
&gt; agree that it is,<br>
<br>
The pgpool status &quot;waiting&quot; means that the backend node has never=
<br>
revceived any query from pgpool clients yet. You can safely assume<br>
that that pgpool is up and running. Once pgpool receives queries, the<br>
status should be changed from &quot;waiting&quot; to &quot;up&quot;.<br>
<br>
&gt; and when I try to drop the node it doesn&#39;t actually drop<br>
&gt; it. It just goes into waiting again.<br>
<br>
Sounds like an effect of auto fail back. because you set:<br>
<br>
=C2=A0auto_failback_interval =3D 1<br>
<br>
pgpool almost immediately brings the pgpool to online.<br>
<br>
&gt; I also don=E2=80=99t see any connection<br>
&gt; attempts from the pgpool server to the postgres nodes if I look at pos=
tgres<br>
&gt; logs. I&#39;ve confirmed that it can run the postgres commands from th=
e command<br>
&gt; line. I&#39;ve tried this both running pgpool as a service and running=
 it<br>
&gt; directly from the command line. No difference in behavior.<br>
<br>
Probably there&#39;s something wrong in the configuration or trying to<br>
connect to wrong IP and/or port. Please turn on log_client_messages<br>
and log_per_node_statement, then send an SQL command to pgpool, and<br>
examin the pgpool log.<br>
<br>
&gt; Here=E2=80=99s the log output:<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.037: main pid 1085028: LOG:=C2=A0 =3D=3D=3D Starti=
ng fail back.<br>
&gt; reconnect host 10.6.1.200(5432) =3D=3D=3D<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:4169<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.037: main pid 1085028: LOG:=C2=A0 Node 0 is not do=
wn<br>
&gt; (status: 2)<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:1524<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.038: main pid 1085028: LOG:=C2=A0 Do not restart c=
hildren<br>
&gt; because we are failing back node id 1 host: 10.6.1.200 port: 5432 and =
we<br>
&gt; are in streaming replication mode and not all backends were down<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:4370<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.038: main pid 1085028: LOG:<br>
&gt; find_primary_node_repeatedly: waiting for finding a primary node<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:2896<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 find_primary_nod=
e: primary<br>
&gt; node is 0<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:2815<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 find_primary_nod=
e: standby<br>
&gt; node is 1<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:2821<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 failover: set ne=
w primary<br>
&gt; node: 0<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:4660<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 failover: set ne=
w main<br>
&gt; node: 0<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:4667<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 =3D=3D=3D Failba=
ck done.<br>
&gt; reconnect host 10.6.1.200(5432) =3D=3D=3D<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:4763<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:=C2=A0 worke=
r process<br>
&gt; received restart request<br>
&gt; <br>
&gt; 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:<br>
&gt; pool_worker_child.c:182<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:=C2=A0 restart requ=
est<br>
&gt; received in pcp child process<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:=C2=A0 pcp_chi=
ld.c:173<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.193: main pid 1085028: LOG:=C2=A0 PCP child 108508=
7 exits<br>
&gt; with status 0 in failover()<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:4850<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.193: main pid 1085028: LOG:=C2=A0 fork a new PCP c=
hild pid<br>
&gt; 1085089 in failover()<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:=C2=A0 pgpool_main=
.c:4854<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:=C2=A0 PCP process:=
 1085089<br>
&gt; started<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi=
ld.c:165<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:=C2=A0 proce=
ss started<br>
&gt; <br>
&gt; 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:<br>
&gt; pgpool_main.c:905<br>
&gt; <br>
&gt; 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:=C2=A0 forked new p=
cp worker,<br>
&gt; pid=3D1085093 socket=3D7<br>
&gt; <br>
&gt; 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi=
ld.c:327<br>
&gt; <br>
&gt; 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:=C2=A0 PCP process =
with pid:<br>
&gt; 1085093 exit with SUCCESS.<br>
&gt; <br>
&gt; 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi=
ld.c:384<br>
&gt; <br>
&gt; 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:=C2=A0 PCP process =
with pid:<br>
&gt; 1085093 exits with status 0<br>
&gt; <br>
&gt; 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi=
ld.c:398<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.480: child pid 1085050: LOG:=C2=A0 failover or fai=
lback<br>
&gt; event detected<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:=C2=A0 restarting m=
yself<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:=C2=A0 child.c:15=
24<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.480: child pid 1085038: LOG:=C2=A0 failover or fai=
lback<br>
&gt; event detected<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:=C2=A0 restarting m=
yself<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:=C2=A0 child.c:15=
24<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085035: LOG:=C2=A0 failover or fai=
lback<br>
&gt; event detected<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:=C2=A0 restarting m=
yself<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:=C2=A0 child.c:15=
24<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085061: LOG:=C2=A0 failover or fai=
lback<br>
&gt; event detected<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:=C2=A0 restarting m=
yself<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:=C2=A0 child.c:15=
24<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.483: child pid 1085053: LOG:=C2=A0 failover or fai=
lback<br>
&gt; event detected<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:=C2=A0 restarting m=
yself<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:=C2=A0 child.c:15=
24<br>
&gt; <br>
&gt; 2025-12-03 14:25:39.483: child pid 1085059: LOG:=C2=A0 failover or fai=
lback<br>
&gt; event detected<br>
&gt; <br>
&gt; ......over and over and over again.<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; pcp_node_info output:<br>
&gt; <br>
&gt; 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none<br>
&gt; 2025-12-03 14:04:39<br>
&gt; <br>
&gt; 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming asyn=
c<br>
&gt; 2025-12-03 14:04:39<br>
&gt; <br>
&gt; Logs show:<br>
&gt; <br>
&gt; node status[0]: 1<br>
&gt; <br>
&gt; node status[1]: 2<br>
&gt; <br>
&gt; Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status=
 2<br>
&gt; (up).<br>
<br>
No, this does not show the backend status. Instead, it says <br>
<br>
&gt; node status[0]: 1<br>
<br>
This means backend 0 is primary.<br>
<br>
&gt; node status[1]: 2<br>
<br>
This means backend 1 is standby.<br>
<br>
&gt; *auto_failback behavior:*<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 - When a node is detached (pcp_detach_node), it goes to s=
tatus 3 (down)<br>
&gt;=C2=A0 =C2=A0 - auto_failback triggers and moves it to status 1 (waitin=
g)<br>
&gt;=C2=A0 =C2=A0 - Node never transitions from waiting to up<br>
<br>
Sounds like pgpool has not received queries.<br>
<br>
&gt; *Key configuration:*<br>
&gt; <br>
&gt; backend_clustering_mode =3D &#39;streaming_replication&#39;<br>
&gt; <br>
&gt; backend_hostname0 =3D &#39;10.6.1.199&#39;<br>
&gt; <br>
&gt; backend_hostname1 =3D &#39;10.6.1.200&#39;<br>
&gt; <br>
&gt; backend_application_name0 =3D &#39;nasdw_users_1&#39;<br>
&gt; <br>
&gt; backend_application_name1 =3D &#39;nasdw_users_2&#39;<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; use_watchdog =3D on<br>
&gt; <br>
&gt; # 3 watchdog nodes configured<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; auto_failback =3D on<br>
&gt; <br>
&gt; auto_failback_interval =3D 1<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; sr_check_period =3D 10<br>
&gt; <br>
&gt; sr_check_user =3D &#39;pgpool&#39;<br>
&gt; <br>
&gt; sr_check_database =3D &#39;nasdw_users&#39;<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; health_check_period =3D 1<br>
&gt; <br>
&gt; health_check_user =3D &#39;pgpool&#39;<br>
&gt; <br>
&gt; health_check_database =3D &#39;nasdw_users&#39;<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; failover_when_quorum_exists =3D on (default)<br>
&gt; <br>
&gt; failover_require_consensus =3D on (default)<br>
&gt; Cheers,<br>
&gt; Adam<br>
</blockquote></div>

--000000000000dcaad6064613c8be--