Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVXQc-004qoV-0M for pgpool-general@arkaria.postgresql.org; Tue, 16 Dec 2025 15:57:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vVXQb-007VpO-07 for pgpool-general@arkaria.postgresql.org; Tue, 16 Dec 2025 15:57:33 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVXQa-007Vox-1v for pgpool-general@lists.postgresql.org; Tue, 16 Dec 2025 15:57:33 +0000 Received: from mail-qk1-x72f.google.com ([2607:f8b0:4864:20::72f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vVXQX-000zPx-2a for pgpool-general@lists.postgresql.org; Tue, 16 Dec 2025 15:57:32 +0000 Received: by mail-qk1-x72f.google.com with SMTP id af79cd13be357-8b2148ca40eso684419785a.1 for ; Tue, 16 Dec 2025 07:57:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765900648; x=1766505448; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MEtBb8XLg219/CB5FwjoXXg7hw/UgB22u8YgGcTS3vU=; b=Qilu0vUzQjlLZPAIJCUhex8FlhFAAw8I+okO0paAtSdN4J8mtdS4Tq7mLpf1TsmUlN N+HESmXmy2JEqVBNLOBphuOZem/n508WNF2uoz6AKPXm8P8JSBRYKqOzbXSRgcVYE2Ll FbmtS7w6uXNkLtxDwBrsXM1+c66ayDhgWG/onLrQsB+5ixCQsaeJ7IIJT0WnIsDURgg/ AA1aGV22xq5UNUFEVBTsQ5fQpY9S1TbZZCxTzEHz2tWhKFsRvZBZlbYLYR+JCXS7i3ns SJw+uKsluKtF7zhromRP9Suisy3nusCDYoO1L9ftpLjyCqAUSYUqJigN9F4lOkP2x7T/ hGCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765900648; x=1766505448; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MEtBb8XLg219/CB5FwjoXXg7hw/UgB22u8YgGcTS3vU=; b=RwD7EpafDfycvtw8M0OpnhFjVwHWl7z4HYTTh9tDRbm7IOu1gXy9HFarq8QNy2BFgq oqzxBFoYldnJ90D+FMQiBc/ybjwaauLzL1k5nXma5cwMzO3gDmYeamC87GNwwq2dd6mj csXsBnxLxpZWNvr17m7a5ygPfTkeEiNvge6aMTyQ13XRZVv8W1y8P17MMNaRlNwaUra/ AKia37rMvTnr7SII4gv4YLR1AGXVafDZxzxIOjp4iRfo350i6DtRJ8OjjesS3auXHrPA 5YjiW3rewFP+0QqQQ8Zk0GBDSaaVP9YAnvtu8guJaiqE/74bEQXo3YSjyvw1pr6hesZp HcSg== X-Gm-Message-State: AOJu0YxCAnJPenaAaHbvRjBVaO4soLoqdAmTH274IZmP/NhkxN6JIgWw 1aXjJlrxTZtdG2nMU3sR39N35OcO40yc9jgbAa5xf1BaiRz1wlMssNqbiIDsTYQ545BUIhWJyn1 3sc8MAevzPZyHaVHcFFWOp2hCO1nBuG+YdNxV X-Gm-Gg: AY/fxX5/kC4ZwAFYXHoLEWEdzda+nbYKuz8FKlBmlyTY27tWgc8fsaJ6AAbsPy5U0xH V3DPNigdZAHoQnRYOcULuuYClJGiFmL8U7U3QR+iv/vJtGZ7V83L9Nnm3VTCTJiE4HuHBEQvlhv dn+yBJ+WB9zExMlSfbWovXwbXBhI0GWpy55mBgUEQl6QtI7XLfmZCtuA/JXgXi38/gG36QGQF6q Ss1ERXSjmTxVVpqzhf+JLu0MFdevGt9O1/BJfjWkvHcA6xmtmL+OEhsuiqMU0mpUV/76pb1rJ/w c/Pm7EjwSl983Hw0uTjfTYKxdBLF X-Google-Smtp-Source: AGHT+IH7c+m/liFTDgnDfwiVnAm+veadvnVxIbaMAxYxtgLGrYULEc/eFHVFqHVtptoFPk1qXvUZuiuS/R/Zw6/6S6c= X-Received: by 2002:a05:620a:44c1:b0:8b1:c886:1c19 with SMTP id af79cd13be357-8bb39ccfa77mr2023234385a.23.1765900648489; Tue, 16 Dec 2025 07:57:28 -0800 (PST) MIME-Version: 1.0 References: <20251216.084325.87549965646144515.ishii@postgresql.org> In-Reply-To: <20251216.084325.87549965646144515.ishii@postgresql.org> From: Adam Blomeke Date: Tue, 16 Dec 2025 10:57:16 -0500 X-Gm-Features: AQt7F2qwxXl5HZDbCoryDSJ6ySsbdfPCFH7pTjb4_RzI4m_B_60kUTdBTIPv9OI Message-ID: Subject: Re: Pgpool can't detect database status properly To: Tatsuo Ishii Cc: pgpool-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000dcaad6064613c8be" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000dcaad6064613c8be Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the reply. Your response makes sense as I'm still setting this cluster up, so it's just me trying to connect to it. I'm curious then what the right process is for when I need to pull a node out of the cluster for maintenance (e.g. patching). I was under the impression that I should drop the node, do a pg_rewind, manually set it as a standby if it was the primary, and then add the node back in pgpool. I guess I can't do that with auto failback turned on? Cheers, Adam On Mon, Dec 15, 2025 at 6:43=E2=80=AFPM Tatsuo Ishii = wrote: > > I'm resending this as it's been sitting in the moderation queue for a > > while. Possibly because I didn't have a subject line? Anyways, any help > > would be great. Thanks! > > I received your email this time. > > > I=E2=80=99m setting up a pgpool cluster to replace a single node databa= se in my > > environment. The single node is separate from the cluster at the moment= . > > When it=E2=80=99s time to implement the DB I=E2=80=99m going to redo th= e backup/restore, > > throw an upgrade from pg15->18, and then bring the cluster and take ove= r > > the old IP. > > > > > > > > *Environment:* > > > > - pgpool-II version: 4.6.3 (chirikoboshi) > > - PostgreSQL version: 18 > > - OS: RHEL9 > > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, > 10.6.1.198) > > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) > > > > > > > > *Issue:* > > > > I have pgpool configured and I=E2=80=99ve set it up using the scripts a= nd config > > files from a different instance, one which has been running just fine > for a > > year and a half or so. The issue I=E2=80=99m experiencing is that when = I > > detach/reattach a node, it sits in waiting constantly. It never > transitions > > to up. > > If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql > and issue an SQL command, for example "SELECT 1", does it work? If it > works, it means pgpool works fine. > > > I have to manually change the status file to up for it to get to > > agree that it is, > > The pgpool status "waiting" means that the backend node has never > revceived any query from pgpool clients yet. You can safely assume > that that pgpool is up and running. Once pgpool receives queries, the > status should be changed from "waiting" to "up". > > > and when I try to drop the node it doesn't actually drop > > it. It just goes into waiting again. > > Sounds like an effect of auto fail back. because you set: > > auto_failback_interval =3D 1 > > pgpool almost immediately brings the pgpool to online. > > > I also don=E2=80=99t see any connection > > attempts from the pgpool server to the postgres nodes if I look at > postgres > > logs. I've confirmed that it can run the postgres commands from the > command > > line. I've tried this both running pgpool as a service and running it > > directly from the command line. No difference in behavior. > > Probably there's something wrong in the configuration or trying to > connect to wrong IP and/or port. Please turn on log_client_messages > and log_per_node_statement, then send an SQL command to pgpool, and > examin the pgpool log. > > > Here=E2=80=99s the log output: > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOG: =3D=3D=3D Starting fai= l back. > > reconnect host 10.6.1.200(5432) =3D=3D=3D > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:416= 9 > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down > > (status: 2) > > > > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:152= 4 > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart childre= n > > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and w= e > > are in streaming replication mode and not all backends were down > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:437= 0 > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOG: > > find_primary_node_repeatedly: waiting for finding a primary node > > > > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:289= 6 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > primary > > node is 0 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:281= 5 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > standby > > node is 1 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:282= 1 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new > primary > > node: 0 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:466= 0 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main > > node: 0 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:466= 7 > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOG: =3D=3D=3D Failback don= e. > > reconnect host 10.6.1.200(5432) =3D=3D=3D > > > > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:476= 3 > > > > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker > process > > received restart request > > > > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: > > pool_worker_child.c:182 > > > > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request > > received in pcp child process > > > > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:1= 73 > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exit= s > > with status 0 in failover() > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:485= 0 > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child p= id > > 1085089 in failover() > > > > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:485= 4 > > > > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 10850= 89 > > started > > > > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:1= 65 > > > > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process > started > > > > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: > > pgpool_main.c:905 > > > > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp > worker, > > pid=3D1085093 socket=3D7 > > > > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:3= 27 > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with > pid: > > 1085093 exit with SUCCESS. > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:3= 84 > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with > pid: > > 1085093 exits with status 0 > > > > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:3= 98 > > > > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback > > event detected > > > > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself > > > > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 > > > > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback > > event detected > > > > ......over and over and over again. > > > > > > > > > > pcp_node_info output: > > > > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none > > 2025-12-03 14:04:39 > > > > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async > > 2025-12-03 14:04:39 > > > > Logs show: > > > > node status[0]: 1 > > > > node status[1]: 2 > > > > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status = 2 > > (up). > > No, this does not show the backend status. Instead, it says > > > node status[0]: 1 > > This means backend 0 is primary. > > > node status[1]: 2 > > This means backend 1 is standby. > > > *auto_failback behavior:* > > > > - When a node is detached (pcp_detach_node), it goes to status 3 > (down) > > - auto_failback triggers and moves it to status 1 (waiting) > > - Node never transitions from waiting to up > > Sounds like pgpool has not received queries. > > > *Key configuration:* > > > > backend_clustering_mode =3D 'streaming_replication' > > > > backend_hostname0 =3D '10.6.1.199' > > > > backend_hostname1 =3D '10.6.1.200' > > > > backend_application_name0 =3D 'nasdw_users_1' > > > > backend_application_name1 =3D 'nasdw_users_2' > > > > > > > > use_watchdog =3D on > > > > # 3 watchdog nodes configured > > > > > > > > auto_failback =3D on > > > > auto_failback_interval =3D 1 > > > > > > > > sr_check_period =3D 10 > > > > sr_check_user =3D 'pgpool' > > > > sr_check_database =3D 'nasdw_users' > > > > > > > > health_check_period =3D 1 > > > > health_check_user =3D 'pgpool' > > > > health_check_database =3D 'nasdw_users' > > > > > > > > failover_when_quorum_exists =3D on (default) > > > > failover_require_consensus =3D on (default) > > Cheers, > > Adam > --000000000000dcaad6064613c8be Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for the reply.=C2=A0

= Your response makes sense as I'm still setting this cluster up, so it&#= 39;s just me trying to connect to it.

I'm curi= ous then what the right process is for when I need to pull a node out of th= e cluster for maintenance (e.g. patching). I was under the impression that = I should drop the node, do a pg_rewind, manually set it as a standby if it = was the primary, and then add the node back in pgpool. I guess I can't = do that with auto failback turned on?

Cheers,
Adam
=


=
On Mon, Dec 15, 2025 at 6:43=E2=80=AF= PM Tatsuo Ishii <ishii@postgresq= l.org> wrote:
> I'm resending this as it's been sitting in the moderation= queue for a
> while. Possibly because I didn't have a subject line? Anyways, any= help
> would be great. Thanks!

I received your email this time.

> I=E2=80=99m setting up a pgpool cluster to replace a single node datab= ase in my
> environment. The single node is separate from the cluster at the momen= t.
> When it=E2=80=99s time to implement the DB I=E2=80=99m going to redo t= he backup/restore,
> throw an upgrade from pg15->18, and then bring the cluster and take= over
> the old IP.
>
>
>
> *Environment:*
>
>=C2=A0 =C2=A0 - pgpool-II version: 4.6.3 (chirikoboshi)
>=C2=A0 =C2=A0 - PostgreSQL version: 18
>=C2=A0 =C2=A0 - OS: RHEL9
>=C2=A0 =C2=A0 - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.19= 7, 10.6.1.198)
>=C2=A0 =C2=A0 + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 stan= dby)
>
>
>
> *Issue:*
>
> I have pgpool configured and I=E2=80=99ve set it up using the scripts = and config
> files from a different instance, one which has been running just fine = for a
> year and a half or so. The issue I=E2=80=99m experiencing is that when= I
> detach/reattach a node, it sits in waiting constantly. It never transi= tions
> to up.

If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
and issue an SQL command, for example "SELECT 1", does it work? I= f it
works, it means pgpool works fine.

> I have to manually change the status file to up for it to get to
> agree that it is,

The pgpool status "waiting" means that the backend node has never=
revceived any query from pgpool clients yet. You can safely assume
that that pgpool is up and running. Once pgpool receives queries, the
status should be changed from "waiting" to "up".

> and when I try to drop the node it doesn't actually drop
> it. It just goes into waiting again.

Sounds like an effect of auto fail back. because you set:

=C2=A0auto_failback_interval =3D 1

pgpool almost immediately brings the pgpool to online.

> I also don=E2=80=99t see any connection
> attempts from the pgpool server to the postgres nodes if I look at pos= tgres
> logs. I've confirmed that it can run the postgres commands from th= e command
> line. I've tried this both running pgpool as a service and running= it
> directly from the command line. No difference in behavior.

Probably there's something wrong in the configuration or trying to
connect to wrong IP and/or port. Please turn on log_client_messages
and log_per_node_statement, then send an SQL command to pgpool, and
examin the pgpool log.

> Here=E2=80=99s the log output:
>
> 2025-12-03 14:20:49.037: main pid 1085028: LOG:=C2=A0 =3D=3D=3D Starti= ng fail back.
> reconnect host 10.6.1.200(5432) =3D=3D=3D
>
> 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:4169
>
> 2025-12-03 14:20:49.037: main pid 1085028: LOG:=C2=A0 Node 0 is not do= wn
> (status: 2)
>
> 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:1524
>
> 2025-12-03 14:20:49.038: main pid 1085028: LOG:=C2=A0 Do not restart c= hildren
> because we are failing back node id 1 host: 10.6.1.200 port: 5432 and = we
> are in streaming replication mode and not all backends were down
>
> 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:4370
>
> 2025-12-03 14:20:49.038: main pid 1085028: LOG:
> find_primary_node_repeatedly: waiting for finding a primary node
>
> 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:2896
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 find_primary_nod= e: primary
> node is 0
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:2815
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 find_primary_nod= e: standby
> node is 1
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:2821
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 failover: set ne= w primary
> node: 0
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:4660
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 failover: set ne= w main
> node: 0
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:4667
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 =3D=3D=3D Failba= ck done.
> reconnect host 10.6.1.200(5432) =3D=3D=3D
>
> 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:4763
>
> 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:=C2=A0 worke= r process
> received restart request
>
> 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
> pool_worker_child.c:182
>
> 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:=C2=A0 restart requ= est
> received in pcp child process
>
> 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:=C2=A0 pcp_chi= ld.c:173
>
> 2025-12-03 14:20:50.193: main pid 1085028: LOG:=C2=A0 PCP child 108508= 7 exits
> with status 0 in failover()
>
> 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:4850
>
> 2025-12-03 14:20:50.193: main pid 1085028: LOG:=C2=A0 fork a new PCP c= hild pid
> 1085089 in failover()
>
> 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:=C2=A0 pgpool_main= .c:4854
>
> 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:=C2=A0 PCP process:= 1085089
> started
>
> 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi= ld.c:165
>
> 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:=C2=A0 proce= ss started
>
> 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
> pgpool_main.c:905
>
> 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:=C2=A0 forked new p= cp worker,
> pid=3D1085093 socket=3D7
>
> 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi= ld.c:327
>
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:=C2=A0 PCP process = with pid:
> 1085093 exit with SUCCESS.
>
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi= ld.c:384
>
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:=C2=A0 PCP process = with pid:
> 1085093 exits with status 0
>
> 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:=C2=A0 pcp_chi= ld.c:398
>
> 2025-12-03 14:25:39.480: child pid 1085050: LOG:=C2=A0 failover or fai= lback
> event detected
>
> 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:=C2=A0 restarting m= yself
>
> 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:=C2=A0 child.c:15= 24
>
> 2025-12-03 14:25:39.480: child pid 1085038: LOG:=C2=A0 failover or fai= lback
> event detected
>
> 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:=C2=A0 restarting m= yself
>
> 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:=C2=A0 child.c:15= 24
>
> 2025-12-03 14:25:39.481: child pid 1085035: LOG:=C2=A0 failover or fai= lback
> event detected
>
> 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:=C2=A0 restarting m= yself
>
> 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:=C2=A0 child.c:15= 24
>
> 2025-12-03 14:25:39.481: child pid 1085061: LOG:=C2=A0 failover or fai= lback
> event detected
>
> 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:=C2=A0 restarting m= yself
>
> 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:=C2=A0 child.c:15= 24
>
> 2025-12-03 14:25:39.483: child pid 1085053: LOG:=C2=A0 failover or fai= lback
> event detected
>
> 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:=C2=A0 restarting m= yself
>
> 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:=C2=A0 child.c:15= 24
>
> 2025-12-03 14:25:39.483: child pid 1085059: LOG:=C2=A0 failover or fai= lback
> event detected
>
> ......over and over and over again.
>
>
>
>
> pcp_node_info output:
>
> 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
> 2025-12-03 14:04:39
>
> 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming asyn= c
> 2025-12-03 14:04:39
>
> Logs show:
>
> node status[0]: 1
>
> node status[1]: 2
>
> Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status= 2
> (up).

No, this does not show the backend status. Instead, it says

> node status[0]: 1

This means backend 0 is primary.

> node status[1]: 2

This means backend 1 is standby.

> *auto_failback behavior:*
>
>=C2=A0 =C2=A0 - When a node is detached (pcp_detach_node), it goes to s= tatus 3 (down)
>=C2=A0 =C2=A0 - auto_failback triggers and moves it to status 1 (waitin= g)
>=C2=A0 =C2=A0 - Node never transitions from waiting to up

Sounds like pgpool has not received queries.

> *Key configuration:*
>
> backend_clustering_mode =3D 'streaming_replication'
>
> backend_hostname0 =3D '10.6.1.199'
>
> backend_hostname1 =3D '10.6.1.200'
>
> backend_application_name0 =3D 'nasdw_users_1'
>
> backend_application_name1 =3D 'nasdw_users_2'
>
>
>
> use_watchdog =3D on
>
> # 3 watchdog nodes configured
>
>
>
> auto_failback =3D on
>
> auto_failback_interval =3D 1
>
>
>
> sr_check_period =3D 10
>
> sr_check_user =3D 'pgpool'
>
> sr_check_database =3D 'nasdw_users'
>
>
>
> health_check_period =3D 1
>
> health_check_user =3D 'pgpool'
>
> health_check_database =3D 'nasdw_users'
>
>
>
> failover_when_quorum_exists =3D on (default)
>
> failover_require_consensus =3D on (default)
> Cheers,
> Adam
--000000000000dcaad6064613c8be--