Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vahlH-00CFni-32 for pgpool-general@arkaria.postgresql.org; Tue, 30 Dec 2025 22:00:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vahlF-004hJ9-36 for pgpool-general@arkaria.postgresql.org; Tue, 30 Dec 2025 22:00:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vahlF-004hJ2-2D for pgpool-general@lists.postgresql.org; Tue, 30 Dec 2025 22:00:14 +0000 Received: from mail-qv1-xf2a.google.com ([2607:f8b0:4864:20::f2a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vahlD-003d3i-1h for pgpool-general@lists.postgresql.org; Tue, 30 Dec 2025 22:00:14 +0000 Received: by mail-qv1-xf2a.google.com with SMTP id 6a1803df08f44-88860551e39so87282056d6.3 for ; Tue, 30 Dec 2025 14:00:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767132009; x=1767736809; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6p0YkTvhzGZW32H5gJYl0fPDwxi1Y7yhlMnBJnff2qQ=; b=BA+iy6lHiykagK/WsXHx8rt2xMhUSHuTLIL1t+u+VVnqhDRNXxlQQ/42vTQon+40lq I5xrjRz3Jj3Wr+fQYaXRqsjZ/cxyzadf/0f2JhSMH0lf5JWzsi5ZbbUpEj6yEH9wDPOx D3aQCN3+TPWN1JL/nkUlwgAUJvJ8cJYv/m99NZlZgek5Le2l1AZiLFKWV3dR6YRCampa hhWwutM0iQwAC4i0WV3U1tlDwvHE/KE2+PnuiwwvgWzywdkL51KtHIWtThkVxSW8cTVr XM1U46TNGEgKQ2ZKe1UxlhV+VLH3XXFIj01m2PmkaxtyAr8CdVMvUzMdZcbn/X63p+fc PXtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767132009; x=1767736809; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6p0YkTvhzGZW32H5gJYl0fPDwxi1Y7yhlMnBJnff2qQ=; b=dzY4qWY/LlLMyJ9TQQF+O9ioIzyFjtCVG0XMW10dzoi20y5p0Z5AEIQiO1/Goo8qFb awEkkGLAA+7bLlhs97bNnNyNeQTjkj1beJD67jMDgoz3RWWnJ76WHyueFRFoQu/JL6B1 oJzpqht7MafVTEvfptPXAtEE9AZx4yrbEiHDTjIuAZWZKg/YM9u1I3AvYbX7XveqjE/y d3j5OoC8KjMdDQlH1c5cj0YTN/Ffzmpl8A2pAg5QTSrkIl44pyZxaYA/froFinGN2RJW FuxsAvYOeUwKPdGn91e+r+Q2YlGKvCwsiwxaYViUpVehePmkNjO2gwOdSa1snb1rFg2J euXw== X-Gm-Message-State: AOJu0Yzeu4sDmniqZ3cm/2m90eSOVe+ue2aF54QVx7R+qi4oDv38tmTa XMGKn0XM/H8Q2aE6YctSJ1XB0H6rtFItAkehQKLk2nCRF/RVUWPAcOKMmNLMhwTC/FXVzy++jy3 e/Hg3QuUKQghwKCtNopT9N3lPDEGjXAJZPdHxGkc= X-Gm-Gg: AY/fxX5v13smsTu7CrOwMIrXOfN0hQIbaLtwnSVfVXTPj3Bfvbec49xZgVc7wiZ5FLt sbFrVN63rFEnu6x6XgvnLO1d+LgsVTeEjDYaID7URYu2i86JBsQZuHznshF08erLc5PdQ1AvmYs yLDZdlajEkbytE1K3qrhYAPMHRNwaZjIEOU5ZKtkY4wMtslmn5h4+rbqLNm3br0kBJNaOwspB5S 1qLK/h7LDnQ6vu6qradbX2TXxin3nEX/RS0hVNMXXTI5gIkLC57qkoARIol2mzylbvpsM1qhvoS MT/Y2hgzCPotqlsozX/ERTL/tdaZ X-Google-Smtp-Source: AGHT+IFe0WxBqgvHrdDBrTbl0QNX8oAS8rEgiU956XYf1EL6c5toeygOrPBKjYDgBUTqcFJxSqxNP+X7DKK3vqFg1Cg= X-Received: by 2002:a05:6214:8095:b0:87c:275d:adcd with SMTP id 6a1803df08f44-88d8369eb9emr488668396d6.41.1767132008476; Tue, 30 Dec 2025 14:00:08 -0800 (PST) MIME-Version: 1.0 References: <20251216.084325.87549965646144515.ishii@postgresql.org> <20251217.102935.700318527231632102.ishii@postgresql.org> In-Reply-To: <20251217.102935.700318527231632102.ishii@postgresql.org> From: Adam Blomeke Date: Tue, 30 Dec 2025 16:59:57 -0500 X-Gm-Features: AQt7F2o8KWo88oPkSZ3T9763BSvW24W_--EPSl21gzee3nfIvzuJmTkU5PvtEac Message-ID: Subject: Re: Pgpool can't detect database status properly To: Tatsuo Ishii Cc: pgpool-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000a2f5260647327b11" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a2f5260647327b11 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Follow up on this. I've got autofailback set to off, but it's executing the follow primary script on the node as soon as I attempt to detach it. Is this expected behavior? [postgres@awaprodxtrldbpgpool1 ~]$ grep auto_failback /etc/pgpool-II/pgpool.conf auto_failback =3D off auto_failback_interval =3D 1 # Min interval of executing auto_failback in [postgres@awaprodxtrldbpgpool1 ~]$ # Checkpoint on primary psql -h 10.6.1.199 -d postgres -c "CHECKPOINT;" # Detach primary to trigger failover pcp_detach_node -h 10.6.1.54 -U pgpool -n 0 # Verify node 1 is now primary pcp_node_info -a -U pgpool -h 10.6.1.54 CHECKPOINT pcp_detach_node -- Command Successful 10.6.1.199 5432 3 0.500000 down up primary primary 0 none none 2025-12-30 14:22:10 10.6.1.200 5432 2 0.500000 up up standby standby 0 none none 2025-12-30 12:04:09 [postgres@awaprodxtrldbpgpool1 ~]$ pcp_node_info -a -U pgpool -h 10.6.1.54 10.6.1.199 5432 3 0.500000 down down standby unknown 0 none none 2025-12-30 14:22:11 10.6.1.200 5432 2 0.500000 up up primary primary 0 none none 2025-12-30 14:22:11 [postgres@awaprodxtrldbpgpool1 ~]$ ssh postgres@10.6.1.199 "/usr/pgsql-18/bin/pg_ctl -D /opt/data/data18 stop" Authorized uses only. All activity may be monitored and reported. pg_ctl: PID file "/opt/data/data18/postmaster.pid" does not exist Is server running? [postgres@awaprodxtrldbpgpool1 ~]$ ssh postgres@10.6.1.199 Authorized uses only. All activity may be monitored and reported. Last login: Tue Dec 30 11:45:09 2025 from 10.6.1.196 -bash: typeset: TMOUT: readonly variable [postgres@awaproddbvmnasdwusers1 ~]$ cd /opt/data [postgres@awaproddbvmnasdwusers1 data]$ ll total 1236 drwxr-x---. 2 postgres postgres 6 Dec 30 14:22 archive18 drwx------. 23 postgres postgres 4096 Dec 29 12:46 data15 drwx------. 13 postgres postgres 4096 Dec 30 14:22 data18 drwx------. 2 postgres postgres 74 Dec 1 15:50 dbservercert -rw-r-----. 1 postgres postgres 1255731 Oct 24 18:04 pg_basebackup.log [postgres@awaproddbvmnasdwusers1 data]$ cd data18/ [postgres@awaproddbvmnasdwusers1 data18]$ ll total 8 -rw-------. 1 postgres postgres 233 Dec 30 14:22 backup_label drwx------. 6 postgres postgres 66 Dec 30 14:22 base drwx------. 2 postgres postgres 4096 Dec 30 14:22 global drwx------. 2 postgres postgres 26 Dec 30 14:22 pg_commit_ts drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_dynshmem drwx------. 4 postgres postgres 48 Dec 30 14:22 pg_multixact drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_notify drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_serial drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_snapshots drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_subtrans drwx------. 2 postgres postgres 10 Dec 30 14:22 pg_twophase drwx------. 4 postgres postgres 121 Dec 30 14:22 pg_wal [postgres@awaproddbvmnasdwusers1 data18]$ Cheers, Adam On Tue, Dec 16, 2025 at 5:29=E2=80=AFPM Tatsuo Ishii = wrote: > > Thanks for the reply. > > > > Your response makes sense as I'm still setting this cluster up, so it's > > just me trying to connect to it. > > > > I'm curious then what the right process is for when I need to pull a no= de > > out of the cluster for maintenance (e.g. patching). I was under the > > impression that I should drop the node, do a pg_rewind, manually set it > as > > a standby if it was the primary, and then add the node back in pgpool. = I > > guess I can't do that with auto failback turned on? > > Yes, while the maintenance, you should turn off auto failback. > In fact, it's written in the document: > > https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTI= ME-CONFIG-FAILOVER-SETTINGS > > If you plan to detach standby node for maintenance, set this > parameter to off beforehand. Otherwise it's possible that standby > node is reattached against your intention. > > Best regards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp > > > Cheers, > > Adam > > > > > > On Mon, Dec 15, 2025 at 6:43=E2=80=AFPM Tatsuo Ishii > wrote: > > > >> > I'm resending this as it's been sitting in the moderation queue for = a > >> > while. Possibly because I didn't have a subject line? Anyways, any > help > >> > would be great. Thanks! > >> > >> I received your email this time. > >> > >> > I=E2=80=99m setting up a pgpool cluster to replace a single node dat= abase in > my > >> > environment. The single node is separate from the cluster at the > moment. > >> > When it=E2=80=99s time to implement the DB I=E2=80=99m going to redo= the > backup/restore, > >> > throw an upgrade from pg15->18, and then bring the cluster and take > over > >> > the old IP. > >> > > >> > > >> > > >> > *Environment:* > >> > > >> > - pgpool-II version: 4.6.3 (chirikoboshi) > >> > - PostgreSQL version: 18 > >> > - OS: RHEL9 > >> > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, > >> 10.6.1.198) > >> > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby) > >> > > >> > > >> > > >> > *Issue:* > >> > > >> > I have pgpool configured and I=E2=80=99ve set it up using the script= s and > config > >> > files from a different instance, one which has been running just fin= e > >> for a > >> > year and a half or so. The issue I=E2=80=99m experiencing is that wh= en I > >> > detach/reattach a node, it sits in waiting constantly. It never > >> transitions > >> > to up. > >> > >> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql > >> and issue an SQL command, for example "SELECT 1", does it work? If it > >> works, it means pgpool works fine. > >> > >> > I have to manually change the status file to up for it to get to > >> > agree that it is, > >> > >> The pgpool status "waiting" means that the backend node has never > >> revceived any query from pgpool clients yet. You can safely assume > >> that that pgpool is up and running. Once pgpool receives queries, the > >> status should be changed from "waiting" to "up". > >> > >> > and when I try to drop the node it doesn't actually drop > >> > it. It just goes into waiting again. > >> > >> Sounds like an effect of auto fail back. because you set: > >> > >> auto_failback_interval =3D 1 > >> > >> pgpool almost immediately brings the pgpool to online. > >> > >> > I also don=E2=80=99t see any connection > >> > attempts from the pgpool server to the postgres nodes if I look at > >> postgres > >> > logs. I've confirmed that it can run the postgres commands from the > >> command > >> > line. I've tried this both running pgpool as a service and running i= t > >> > directly from the command line. No difference in behavior. > >> > >> Probably there's something wrong in the configuration or trying to > >> connect to wrong IP and/or port. Please turn on log_client_messages > >> and log_per_node_statement, then send an SQL command to pgpool, and > >> examin the pgpool log. > >> > >> > Here=E2=80=99s the log output: > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: =3D=3D=3D Starting = fail > back. > >> > reconnect host 10.6.1.200(5432) =3D=3D=3D > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: > pgpool_main.c:4169 > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down > >> > (status: 2) > >> > > >> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: > pgpool_main.c:1524 > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart > children > >> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 an= d > we > >> > are in streaming replication mode and not all backends were down > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: > pgpool_main.c:4370 > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: > >> > find_primary_node_repeatedly: waiting for finding a primary node > >> > > >> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: > pgpool_main.c:2896 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > >> primary > >> > node is 0 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:2815 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: > >> standby > >> > node is 1 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:2821 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new > >> primary > >> > node: 0 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:4660 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new > main > >> > node: 0 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:4667 > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: =3D=3D=3D Failback = done. > >> > reconnect host 10.6.1.200(5432) =3D=3D=3D > >> > > >> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: > pgpool_main.c:4763 > >> > > >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker > >> process > >> > received restart request > >> > > >> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION: > >> > pool_worker_child.c:182 > >> > > >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request > >> > received in pcp child process > >> > > >> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: > pcp_child.c:173 > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 > exits > >> > with status 0 in failover() > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: > pgpool_main.c:4850 > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP chil= d > pid > >> > 1085089 in failover() > >> > > >> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: > pgpool_main.c:4854 > >> > > >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: > 1085089 > >> > started > >> > > >> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: > pcp_child.c:165 > >> > > >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process > >> started > >> > > >> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION: > >> > pgpool_main.c:905 > >> > > >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp > >> worker, > >> > pid=3D1085093 socket=3D7 > >> > > >> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: > pcp_child.c:327 > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process wit= h > >> pid: > >> > 1085093 exit with SUCCESS. > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: > pcp_child.c:384 > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process wit= h > >> pid: > >> > 1085093 exits with status 0 > >> > > >> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: > pcp_child.c:398 > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failba= ck > >> > event detected > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myse= lf > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failba= ck > >> > event detected > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myse= lf > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failba= ck > >> > event detected > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myse= lf > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failba= ck > >> > event detected > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myse= lf > >> > > >> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failba= ck > >> > event detected > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myse= lf > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524 > >> > > >> > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failba= ck > >> > event detected > >> > > >> > ......over and over and over again. > >> > > >> > > >> > > >> > > >> > pcp_node_info output: > >> > > >> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none > >> > 2025-12-03 14:04:39 > >> > > >> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming > async > >> > 2025-12-03 14:04:39 > >> > > >> > Logs show: > >> > > >> > node status[0]: 1 > >> > > >> > node status[1]: 2 > >> > > >> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets > status 2 > >> > (up). > >> > >> No, this does not show the backend status. Instead, it says > >> > >> > node status[0]: 1 > >> > >> This means backend 0 is primary. > >> > >> > node status[1]: 2 > >> > >> This means backend 1 is standby. > >> > >> > *auto_failback behavior:* > >> > > >> > - When a node is detached (pcp_detach_node), it goes to status 3 > >> (down) > >> > - auto_failback triggers and moves it to status 1 (waiting) > >> > - Node never transitions from waiting to up > >> > >> Sounds like pgpool has not received queries. > >> > >> > *Key configuration:* > >> > > >> > backend_clustering_mode =3D 'streaming_replication' > >> > > >> > backend_hostname0 =3D '10.6.1.199' > >> > > >> > backend_hostname1 =3D '10.6.1.200' > >> > > >> > backend_application_name0 =3D 'nasdw_users_1' > >> > > >> > backend_application_name1 =3D 'nasdw_users_2' > >> > > >> > > >> > > >> > use_watchdog =3D on > >> > > >> > # 3 watchdog nodes configured > >> > > >> > > >> > > >> > auto_failback =3D on > >> > > >> > auto_failback_interval =3D 1 > >> > > >> > > >> > > >> > sr_check_period =3D 10 > >> > > >> > sr_check_user =3D 'pgpool' > >> > > >> > sr_check_database =3D 'nasdw_users' > >> > > >> > > >> > > >> > health_check_period =3D 1 > >> > > >> > health_check_user =3D 'pgpool' > >> > > >> > health_check_database =3D 'nasdw_users' > >> > > >> > > >> > > >> > failover_when_quorum_exists =3D on (default) > >> > > >> > failover_require_consensus =3D on (default) > >> > Cheers, > >> > Adam > >> > --000000000000a2f5260647327b11 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Follow up on this. I've got autofailback=C2=A0set= to off, but it's executing the follow primary script on the node as so= on as I attempt to detach it. Is this expected behavior?

[postgres@a= waprodxtrldbpgpool1 ~]$ grep auto_failback /etc/pgpool-II/pgpool.conf
au= to_failback =3D off
auto_failback_interval =3D 1
=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# Min interval of executing auto_failback in=
[postgres@awaprodxtrldbpgpool1 ~]$ # Checkpoint on primary
psql -h 1= 0.6.1.199 -d postgres -c "CHECKPOINT;"
# Detach primary to tri= gger failover
pcp_detach_node -h 10.6.1.54 -U pgpool -n 0
# Verify no= de 1 is now primary
pcp_node_info -a -U pgpool -h 10.6.1.54
CHECKPOIN= T
pcp_detach_node -- Command Successful
10.6.1.199 5432 3 0.500000 do= wn up primary primary 0 none none 2025-12-30 14:22:10
10.6.1.200 5432 2 = 0.500000 up up standby standby 0 none none 2025-12-30 12:04:09
[postgres= @awaprodxtrldbpgpool1 ~]$ pcp_node_info -a -U pgpool -h 10.6.1.54
10.6.1= .199 5432 3 0.500000 down down standby unknown 0 none none 2025-12-30 14:22= :11
10.6.1.200 5432 2 0.500000 up up primary primary 0 none none 2025-12= -30 14:22:11
[postgres@awaprodxtrldbpgpool1 ~]$ ssh postgres@10.6.1.199 "/usr/pgsql-18/bin/pg_ctl -= D /opt/data/data18 stop"
Authorized uses only. All activity may be = monitored and reported.
pg_ctl: PID file "/opt/data/data18/postmast= er.pid" does not exist
Is server running?
[postgres@awaprodxtrld= bpgpool1 ~]$ ssh postgres@10.6.1.199=
Authorized uses only. All activity may be monitored and reported.Last login: Tue Dec 30 11:45:09 2025 from 10.6.1.196
-bash: typeset: T= MOUT: readonly variable
[postgres@awaproddbvmnasdwusers1 ~]$ cd /opt/dat= a
[postgres@awaproddbvmnasdwusers1 data]$ ll
total 1236
drwxr-x---= . =C2=A02 postgres postgres =C2=A0 =C2=A0 =C2=A0 6 Dec 30 14:22 archive18drwx------. 23 postgres postgres =C2=A0 =C2=A04096 Dec 29 12:46 data15drwx------. 13 postgres postgres =C2=A0 =C2=A04096 Dec 30 14:22 data18
= drwx------. =C2=A02 postgres postgres =C2=A0 =C2=A0 =C2=A074 Dec =C2=A01 15= :50 dbservercert
-rw-r-----. =C2=A01 postgres postgres 1255731 Oct 24 18= :04 pg_basebackup.log
[postgres@awaproddbvmnasdwusers1 data]$ cd data18/=
[postgres@awaproddbvmnasdwusers1 data18]$ ll
total 8
-rw-------. = 1 postgres postgres =C2=A0233 Dec 30 14:22 backup_label
drwx------. 6 po= stgres postgres =C2=A0 66 Dec 30 14:22 base
drwx------. 2 postgres postg= res 4096 Dec 30 14:22 global
drwx------. 2 postgres postgres =C2=A0 26 D= ec 30 14:22 pg_commit_ts
drwx------. 2 postgres postgres =C2=A0 10 Dec 3= 0 14:22 pg_dynshmem
drwx------. 4 postgres postgres =C2=A0 48 Dec 30 14:= 22 pg_multixact
drwx------. 2 postgres postgres =C2=A0 10 Dec 30 14:22 p= g_notify
drwx------. 2 postgres postgres =C2=A0 10 Dec 30 14:22 pg_seria= l
drwx------. 2 postgres postgres =C2=A0 10 Dec 30 14:22 pg_snapshotsdrwx------. 2 postgres postgres =C2=A0 10 Dec 30 14:22 pg_subtrans
drwx= ------. 2 postgres postgres =C2=A0 10 Dec 30 14:22 pg_twophase
drwx-----= -. 4 postgres postgres =C2=A0121 Dec 30 14:22 pg_wal
[postgres@awaproddb= vmnasdwusers1 data18]$

Cheers,
Adam

=
On Tue= , Dec 16, 2025 at 5:29=E2=80=AFPM Tatsuo Ishii <ishii@postgresql.org> wrote:
> Thanks for the r= eply.
>
> Your response makes sense as I'm still setting this cluster up, so= it's
> just me trying to connect to it.
>
> I'm curious then what the right process is for when I need to pull= a node
> out of the cluster for maintenance (e.g. patching). I was under the > impression that I should drop the node, do a pg_rewind, manually set i= t as
> a standby if it was the primary, and then add the node back in pgpool.= I
> guess I can't do that with auto failback turned on?

Yes, while the maintenance, you should turn off auto failback.
In fact, it's written in the document:
= https://www.pgpool.net/docs/47/en/html/runtime-config-failover.html#RUNTIME= -CONFIG-FAILOVER-SETTINGS

=C2=A0 =C2=A0 If you plan to detach standby node for maintenance, set this<= br> =C2=A0 =C2=A0 parameter to off beforehand. Otherwise it's possible that= standby
=C2=A0 =C2=A0 node is reattached against your intention.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Cheers,
> Adam
>
>
> On Mon, Dec 15, 2025 at 6:43=E2=80=AFPM Tatsuo Ishii <ishii@postgresql.org> w= rote:
>
>> > I'm resending this as it's been sitting in the modera= tion queue for a
>> > while. Possibly because I didn't have a subject line? Any= ways, any help
>> > would be great. Thanks!
>>
>> I received your email this time.
>>
>> > I=E2=80=99m setting up a pgpool cluster to replace a single n= ode database in my
>> > environment. The single node is separate from the cluster at = the moment.
>> > When it=E2=80=99s time to implement the DB I=E2=80=99m going = to redo the backup/restore,
>> > throw an upgrade from pg15->18, and then bring the cluster= and take over
>> > the old IP.
>> >
>> >
>> >
>> > *Environment:*
>> >
>> >=C2=A0 =C2=A0 - pgpool-II version: 4.6.3 (chirikoboshi)
>> >=C2=A0 =C2=A0 - PostgreSQL version: 18
>> >=C2=A0 =C2=A0 - OS: RHEL9
>> >=C2=A0 =C2=A0 - Cluster topology: 3 pgpool nodes (10.6.1.196, = 10.6.1.197,
>> 10.6.1.198)
>> >=C2=A0 =C2=A0 + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1= .200 standby)
>> >
>> >
>> >
>> > *Issue:*
>> >
>> > I have pgpool configured and I=E2=80=99ve set it up using the= scripts and config
>> > files from a different instance, one which has been running j= ust fine
>> for a
>> > year and a half or so. The issue I=E2=80=99m experiencing is = that when I
>> > detach/reattach a node, it sits in waiting constantly. It nev= er
>> transitions
>> > to up.
>>
>> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psq= l
>> and issue an SQL command, for example "SELECT 1", does i= t work? If it
>> works, it means pgpool works fine.
>>
>> > I have to manually change the status file to up for it to get= to
>> > agree that it is,
>>
>> The pgpool status "waiting" means that the backend node = has never
>> revceived any query from pgpool clients yet. You can safely assume=
>> that that pgpool is up and running. Once pgpool receives queries, = the
>> status should be changed from "waiting" to "up"= ;.
>>
>> > and when I try to drop the node it doesn't actually drop<= br> >> > it. It just goes into waiting again.
>>
>> Sounds like an effect of auto fail back. because you set:
>>
>>=C2=A0 auto_failback_interval =3D 1
>>
>> pgpool almost immediately brings the pgpool to online.
>>
>> > I also don=E2=80=99t see any connection
>> > attempts from the pgpool server to the postgres nodes if I lo= ok at
>> postgres
>> > logs. I've confirmed that it can run the postgres command= s from the
>> command
>> > line. I've tried this both running pgpool as a service an= d running it
>> > directly from the command line. No difference in behavior. >>
>> Probably there's something wrong in the configuration or tryin= g to
>> connect to wrong IP and/or port. Please turn on log_client_message= s
>> and log_per_node_statement, then send an SQL command to pgpool, an= d
>> examin the pgpool log.
>>
>> > Here=E2=80=99s the log output:
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:=C2=A0 =3D=3D= =3D Starting fail back.
>> > reconnect host 10.6.1.200(5432) =3D=3D=3D
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:4169
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOG:=C2=A0 Node 0 = is not down
>> > (status: 2)
>> >
>> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:1524
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:=C2=A0 Do not = restart children
>> > because we are failing back node id 1 host: 10.6.1.200 port: = 5432 and we
>> > are in streaming replication mode and not all backends were d= own
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:4370
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:
>> > find_primary_node_repeatedly: waiting for finding a primary n= ode
>> >
>> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:2896
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 find_pr= imary_node:
>> primary
>> > node is 0
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:2815
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 find_pr= imary_node:
>> standby
>> > node is 1
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:2821
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 failove= r: set new
>> primary
>> > node: 0
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:4660
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 failove= r: set new main
>> > node: 0
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:4667
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOG:=C2=A0 =3D=3D= =3D Failback done.
>> > reconnect host 10.6.1.200(5432) =3D=3D=3D
>> >
>> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:4763
>> >
>> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG:=C2= =A0 worker
>> process
>> > received restart request
>> >
>> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATIO= N:
>> > pool_worker_child.c:182
>> >
>> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG:=C2=A0 res= tart request
>> > received in pcp child process
>> >
>> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION:=C2= =A0 pcp_child.c:173
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:=C2=A0 PCP chi= ld 1085087 exits
>> > with status 0 in failover()
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:4850
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOG:=C2=A0 fork a = new PCP child pid
>> > 1085089 in failover()
>> >
>> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION:=C2=A0 pg= pool_main.c:4854
>> >
>> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG:=C2=A0 PCP= process: 1085089
>> > started
>> >
>> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION:=C2= =A0 pcp_child.c:165
>> >
>> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG:=C2= =A0 process
>> started
>> >
>> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATIO= N:
>> > pgpool_main.c:905
>> >
>> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG:=C2=A0 for= ked new pcp
>> worker,
>> > pid=3D1085093 socket=3D7
>> >
>> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION:=C2= =A0 pcp_child.c:327
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:=C2=A0 PCP= process with
>> pid:
>> > 1085093 exit with SUCCESS.
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:=C2= =A0 pcp_child.c:384
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG:=C2=A0 PCP= process with
>> pid:
>> > 1085093 exits with status 0
>> >
>> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION:=C2= =A0 pcp_child.c:398
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085050: LOG:=C2=A0 failov= er or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL:=C2=A0 res= tarting myself
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION:=C2=A0 c= hild.c:1524
>> >
>> > 2025-12-03 14:25:39.480: child pid 1085038: LOG:=C2=A0 failov= er or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL:=C2=A0 res= tarting myself
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION:=C2=A0 c= hild.c:1524
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085035: LOG:=C2=A0 failov= er or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL:=C2=A0 res= tarting myself
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION:=C2=A0 c= hild.c:1524
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085061: LOG:=C2=A0 failov= er or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL:=C2=A0 res= tarting myself
>> >
>> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION:=C2=A0 c= hild.c:1524
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085053: LOG:=C2=A0 failov= er or failback
>> > event detected
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL:=C2=A0 res= tarting myself
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION:=C2=A0 c= hild.c:1524
>> >
>> > 2025-12-03 14:25:39.483: child pid 1085059: LOG:=C2=A0 failov= er or failback
>> > event detected
>> >
>> > ......over and over and over again.
>> >
>> >
>> >
>> >
>> > pcp_node_info output:
>> >
>> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none = none
>> > 2025-12-03 14:04:39
>> >
>> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 strea= ming async
>> > 2025-12-03 14:04:39
>> >
>> > Logs show:
>> >
>> > node status[0]: 1
>> >
>> > node status[1]: 2
>> >
>> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) ge= ts status 2
>> > (up).
>>
>> No, this does not show the backend status. Instead, it says
>>
>> > node status[0]: 1
>>
>> This means backend 0 is primary.
>>
>> > node status[1]: 2
>>
>> This means backend 1 is standby.
>>
>> > *auto_failback behavior:*
>> >
>> >=C2=A0 =C2=A0 - When a node is detached (pcp_detach_node), it = goes to status 3
>> (down)
>> >=C2=A0 =C2=A0 - auto_failback triggers and moves it to status = 1 (waiting)
>> >=C2=A0 =C2=A0 - Node never transitions from waiting to up
>>
>> Sounds like pgpool has not received queries.
>>
>> > *Key configuration:*
>> >
>> > backend_clustering_mode =3D 'streaming_replication' >> >
>> > backend_hostname0 =3D '10.6.1.199'
>> >
>> > backend_hostname1 =3D '10.6.1.200'
>> >
>> > backend_application_name0 =3D 'nasdw_users_1'
>> >
>> > backend_application_name1 =3D 'nasdw_users_2'
>> >
>> >
>> >
>> > use_watchdog =3D on
>> >
>> > # 3 watchdog nodes configured
>> >
>> >
>> >
>> > auto_failback =3D on
>> >
>> > auto_failback_interval =3D 1
>> >
>> >
>> >
>> > sr_check_period =3D 10
>> >
>> > sr_check_user =3D 'pgpool'
>> >
>> > sr_check_database =3D 'nasdw_users'
>> >
>> >
>> >
>> > health_check_period =3D 1
>> >
>> > health_check_user =3D 'pgpool'
>> >
>> > health_check_database =3D 'nasdw_users'
>> >
>> >
>> >
>> > failover_when_quorum_exists =3D on (default)
>> >
>> > failover_require_consensus =3D on (default)
>> > Cheers,
>> > Adam
>>
--000000000000a2f5260647327b11--