public inbox for [email protected]  
help / color / mirror / Atom feed
From: Koshino Taiki <[email protected]>
To: [email protected] <[email protected]>
Subject: Proposal: Add lifecheck started status to pcp_watchdog_info.
Date: Thu, 26 Mar 2026 08:15:54 +0000
Message-ID: <TY4PR01MB173744EDFBDCE68DDECAE746B9456A@TY4PR01MB17374.jpnprd01.prod.outlook.com> (raw)

Hi,
This patch adds a new field, "Lifecheck Started", to the pcp_watchdog_info output.
Currently, users need to inspect logs to verify whether the lifecheck has started on each node.
This change allows the status to be checked directly from a single command, making it easier to verify behavior and perform regression testing.
For example:

$ pcp_watchdog_info -h localhost -p 9898 -U pgpool -v
Password:
Watchdog Cluster Information
Total Nodes              : 3
Remote Nodes             : 2
Member Remote Nodes      : 2
Alive Remote Nodes       : 2
Nodes required for quorum: 2
Quorum state             : QUORUM EXIST
Local node escalation    : YES
Leader Node Name         : server1:9999 Linux server1.localdomain
Leader Host Name         : server1

Watchdog Node Information
Node Name         : server1:9999 Linux server1.localdomain
Host Name         : server1
Delegate IP       : 192.168.56.150
Pgpool port       : 9999
Watchdog port     : 9000
Node priority     : 1
Status            : 4
Status Name       : LEADER
Membership Status : MEMBER
Lifecheck Started : YES

Best regards,
Taiki Koshino<[email protected]>
SRA OSS K.K.
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/



Attachments:

  [application/octet-stream] v9-0001-Add-Lifecheck-Started-status-to-pcp_watchdog_info.patch (13.9K, 3-v9-0001-Add-Lifecheck-Started-status-to-pcp_watchdog_info.patch)
  download | inline diff:
From f353f922222f70fad1d104f76adca1010578d40a Mon Sep 17 00:00:00 2001
From: Taiki Koshino <[email protected]>
Date: Thu, 26 Mar 2026 17:04:37 +0900
Subject: [PATCH v9] Add Lifecheck Started status to pcp_watchdog_info output.

This commit enhances the pcp_watchdog_info command by adding a new field, Lifecheck Started,
which indicates whether lifecheck has been started on each watchdog node (NO: not started, YES: started).

This allows users to check the lifecheck status directly from the command output without inspecting logs.

Add a lifecheck_started member to WatchdogNode. When the lifecheck process
detects that lifecheck has started, it notifies the watchdog process, which
sets lifecheck_started to true. When set to true, the status is propagated
across the cluster.

Add a lifecheck_status field to pcp_watchdog_info so that the latest
lifecheck_started status is displayed when the command is called.
---
 doc.ja/src/sgml/ref/pcp_watchdog_info.sgml | 11 ++++++++---
 doc/src/sgml/ref/pcp_watchdog_info.sgml    | 10 +++++++---
 src/include/pcp/pcp.h                      |  2 ++
 src/include/watchdog/watchdog.h            |  2 ++
 src/include/watchdog/wd_commands.h         |  2 ++
 src/include/watchdog/wd_ipc_defines.h      |  2 +-
 src/include/watchdog/wd_lifecheck.h        |  3 ++-
 src/libs/pcp/pcp.c                         |  6 ++++++
 src/tools/pcp/pcp_frontend_client.c        |  8 +++++---
 src/watchdog/watchdog.c                    | 13 +++++++++++++
 src/watchdog/wd_commands.c                 |  7 +++++++
 src/watchdog/wd_json_data.c                |  4 +++-
 src/watchdog/wd_lifecheck.c                | 10 +++++++++-
 13 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml b/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml
index 85ca81e10..61837969c 100644
--- a/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml
+++ b/doc.ja/src/sgml/ref/pcp_watchdog_info.sgml
@@ -108,9 +108,9 @@ $ pcp_watchdog_info -h localhost -p 9898 -U postgres
 Password: 
 3 3 YES server1:9999 Linux server1.localdomain server1
 
-server1:9999 Linux server1.localdomain server1 9999 9000 4 LEADER 0 MEMBER
-server2:9999 Linux server2.localdomain server2 9999 9000 7 STANDBY 0 MEMBER
-server3:9999 Linux server3.localdomain server3 9999 9000 7 STANDBY 0 MEMBER
+server1:9999 Linux server1.localdomain server1 9999 9000 4 LEADER 0 MEMBER YES
+server2:9999 Linux server2.localdomain server2 9999 9000 7 STANDBY 0 MEMBER YES
+server3:9999 Linux server3.localdomain server3 9999 9000 7 STANDBY 0 MEMBER YES
    </programlisting>
   </para>
   <para>
@@ -149,6 +149,7 @@ server3:9999 Linux server3.localdomain server3 9999 9000 7 STANDBY 0 MEMBER
     6. current node state name
     7. current cluster membership status
     8. current cluster membership status name
+    9. Lifecheck start status
     -->
     それ以降は watchdog ノードのリストが出力されます:
 
@@ -160,6 +161,7 @@ server3:9999 Linux server3.localdomain server3 9999 9000 7 STANDBY 0 MEMBER
     6. 現在のノードステータス名
     7. 現在のメンバーシップステータス
     8. 現在のメンバーシップステータス名
+    9. ライフチェックの開始状況
    </literallayout>
   </para>
   <para>
@@ -192,6 +194,7 @@ Node priority     : 1
 Status            : 4
 Status Name       : LEADER
 Membership Status : MEMBER
+Lifecheck Started : YES
 
 Node Name         : server2:9999 Linux server2.localdomain
 Host Name         : server2
@@ -202,6 +205,7 @@ Node priority     : 1
 Status            : 7
 Status Name       : STANDBY
 Membership Status : MEMBER
+Lifecheck Started : YES
 
 Node Name         : server3:9999 Linux server3.localdomain
 Host Name         : server3
@@ -212,6 +216,7 @@ Node priority     : 1
 Status            : 7
 Status Name       : STANDBY
 Membership Status : MEMBER
+Lifecheck Started : YES
   </programlisting>
  </refsect1>
 
diff --git a/doc/src/sgml/ref/pcp_watchdog_info.sgml b/doc/src/sgml/ref/pcp_watchdog_info.sgml
index ce357e93d..81da9a651 100644
--- a/doc/src/sgml/ref/pcp_watchdog_info.sgml
+++ b/doc/src/sgml/ref/pcp_watchdog_info.sgml
@@ -78,9 +78,9 @@ $ pcp_watchdog_info -h localhost -p 9898 -U postgres
 Password: 
 3 3 YES server1:9999 Linux server1.localdomain server1
 
-server1:9999 Linux server1.localdomain server1 9999 9000 4 LEADER 0 MEMBER
-server2:9999 Linux server2.localdomain server2 9999 9000 7 STANDBY 0 MEMBER
-server3:9999 Linux server3.localdomain server3 9999 9000 7 STANDBY 0 MEMBER
+server1:9999 Linux server1.localdomain server1 9999 9000 4 LEADER 0 MEMBER YES
+server2:9999 Linux server2.localdomain server2 9999 9000 7 STANDBY 0 MEMBER YES
+server3:9999 Linux server3.localdomain server3 9999 9000 7 STANDBY 0 MEMBER YES
    </programlisting>
   </para>
   <para>
@@ -105,6 +105,7 @@ server3:9999 Linux server3.localdomain server3 9999 9000 7 STANDBY 0 MEMBER
     6. current node state name
     7. current cluster membership status
     8. current cluster membership status name
+    9. Lifecheck start status
    </literallayout>
   </para>
   <para>
@@ -134,6 +135,7 @@ Node priority     : 1
 Status            : 4
 Status Name       : LEADER
 Membership Status : MEMBER
+Lifecheck Started : YES
 
 Node Name         : server2:9999 Linux server2.localdomain
 Host Name         : server2
@@ -144,6 +146,7 @@ Node priority     : 1
 Status            : 7
 Status Name       : STANDBY
 Membership Status : MEMBER
+Lifecheck Started : YES
 
 Node Name         : server3:9999 Linux server3.localdomain
 Host Name         : server3
@@ -154,6 +157,7 @@ Node priority     : 1
 Status            : 7
 Status Name       : STANDBY
 Membership Status : MEMBER
+Lifecheck Started : YES
   </programlisting>
  </refsect1>
 
diff --git a/src/include/pcp/pcp.h b/src/include/pcp/pcp.h
index e40b96bdc..15a4abb01 100644
--- a/src/include/pcp/pcp.h
+++ b/src/include/pcp/pcp.h
@@ -48,6 +48,8 @@ typedef struct PCPWDNodeInfo
 	int			wd_priority;	/* node priority in leader election */
 	int			pgpool_port;	/* pgpool port */
 	char		delegate_ip[WD_MAX_HOST_NAMELEN];	/* delegate IP */
+	bool		lifecheck_started; /* True means lifecheck is started,
+									* false means lifecheck is not started */
 	int			id;
 }			PCPWDNodeInfo;
 
diff --git a/src/include/watchdog/watchdog.h b/src/include/watchdog/watchdog.h
index 8803283f5..f7699e564 100644
--- a/src/include/watchdog/watchdog.h
+++ b/src/include/watchdog/watchdog.h
@@ -206,6 +206,8 @@ typedef struct WatchdogNode
 									 * initiated by remote */
 	SocketConnection client_socket; /* socket connections for this node
 									 * initiated by local */
+	bool		lifecheck_started;		/* True means lifecheck is started,
+										 * false means lifecheck is not started */
 } WatchdogNode;
 
 /*
diff --git a/src/include/watchdog/wd_commands.h b/src/include/watchdog/wd_commands.h
index a016772f6..f3d579efb 100644
--- a/src/include/watchdog/wd_commands.h
+++ b/src/include/watchdog/wd_commands.h
@@ -42,6 +42,8 @@ typedef struct WDNodeInfo
 	int			wd_priority;	/* node priority */
 	char		delegate_ip[WD_MAX_HOST_NAMELEN];	/* delegate IP */
 	int			id;
+	bool		lifecheck_started /* True means lifecheck is started,
+								   * false means lifecheck is not started */;
 } WDNodeInfo;
 
 typedef struct WDGenericData
diff --git a/src/include/watchdog/wd_ipc_defines.h b/src/include/watchdog/wd_ipc_defines.h
index 7546bfa7e..9a8b85e7d 100644
--- a/src/include/watchdog/wd_ipc_defines.h
+++ b/src/include/watchdog/wd_ipc_defines.h
@@ -124,7 +124,7 @@ typedef enum WDValueDataType
 /* Use to inform node new node status by lifecheck */
 #define WD_LIFECHECK_NODE_STATUS_DEAD 	1
 #define WD_LIFECHECK_NODE_STATUS_ALIVE	2
-
+#define WD_LIFECHECK_NODE_LIFECHECK_STARTED	3
 
 
 #endif
diff --git a/src/include/watchdog/wd_lifecheck.h b/src/include/watchdog/wd_lifecheck.h
index 9460dc346..669ad5b7d 100644
--- a/src/include/watchdog/wd_lifecheck.h
+++ b/src/include/watchdog/wd_lifecheck.h
@@ -33,7 +33,8 @@ typedef enum NodeState
 {
 	NODE_EMPTY,
 	NODE_DEAD,
-	NODE_ALIVE
+	NODE_ALIVE,
+	NODE_LIFECHECK_STARTED
 } NodeStates;
 
 typedef struct LifeCheckNode
diff --git a/src/libs/pcp/pcp.c b/src/libs/pcp/pcp.c
index f8a635065..e0df470b0 100644
--- a/src/libs/pcp/pcp.c
+++ b/src/libs/pcp/pcp.c
@@ -1772,6 +1772,12 @@ process_watchdog_info_response(PCPConnInfo * pcpConn, char *buf, int len)
 				goto INVALID_RESPONSE;
 			}
 
+			if (json_get_bool_value_for_key(nodeInfoValue, "LifecheckStarted", &wdNodeInfo->lifecheck_started))
+			{
+				json_value_free(root);
+				goto INVALID_RESPONSE;
+			}
+
 		}
 		json_value_free(root);
 
diff --git a/src/tools/pcp/pcp_frontend_client.c b/src/tools/pcp/pcp_frontend_client.c
index 9f63a78f4..928749d99 100644
--- a/src/tools/pcp/pcp_frontend_client.c
+++ b/src/tools/pcp/pcp_frontend_client.c
@@ -835,7 +835,8 @@ output_watchdog_info_result(PCPResultInfo * pcpResInfo, bool verbose)
 			printf("Node priority     : %d\n", watchdog_info->wd_priority);
 			printf("Status            : %d\n", watchdog_info->state);
 			printf("Status Name       : %s\n", watchdog_info->stateName);
-			printf("Membership Status : %s\n\n", watchdog_info->membership_status_string);
+			printf("Membership Status : %s\n", watchdog_info->membership_status_string);
+			printf("Lifecheck Started : %s\n\n", watchdog_info->lifecheck_started ? "YES" : "NO");
 		}
 	}
 	else
@@ -851,7 +852,7 @@ output_watchdog_info_result(PCPResultInfo * pcpResInfo, bool verbose)
 		{
 			PCPWDNodeInfo *watchdog_info = &cluster->nodeList[i];
 
-			printf("%s %s %d %d %d %s %d %s\n",
+			printf("%s %s %d %d %d %s %d %s %s\n",
 				   watchdog_info->nodeName,
 				   watchdog_info->hostName,
 				   watchdog_info->pgpool_port,
@@ -859,7 +860,8 @@ output_watchdog_info_result(PCPResultInfo * pcpResInfo, bool verbose)
 				   watchdog_info->state,
 				   watchdog_info->stateName,
 				   watchdog_info->membership_status,
-				   watchdog_info->membership_status_string);
+				   watchdog_info->membership_status_string,
+				   watchdog_info->lifecheck_started ? "YES": "NO");
 		}
 	}
 }
diff --git a/src/watchdog/watchdog.c b/src/watchdog/watchdog.c
index f59e4373a..8d4c20951 100644
--- a/src/watchdog/watchdog.c
+++ b/src/watchdog/watchdog.c
@@ -2486,6 +2486,17 @@ fire_node_status_event(int nodeID, int nodeStatus)
 		else
 			watchdog_state_machine(WD_EVENT_REMOTE_NODE_FOUND, wdNode, NULL, NULL);
 	}
+	else if (nodeStatus == WD_LIFECHECK_NODE_LIFECHECK_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("processing node status changed to LIFECHECK STARTED event for node ID:%d", nodeID)));
+
+		if (wdNode == g_cluster.localNode)
+		{
+			wdNode->lifecheck_started = true;
+			send_message_of_type(NULL, WD_INFO_MESSAGE, NULL);
+		}
+	}
 	else
 		ereport(LOG,
 				(errmsg("failed to process node status change event"),
@@ -3856,6 +3867,7 @@ add_nodeinfo_to_json(JsonNode *jNode, WatchdogNode *node)
 	jw_put_int(jNode, "WdPort", nodeIfNull_int(wd_port, 0));
 	jw_put_int(jNode, "PgpoolPort", nodeIfNull_int(pgpool_port, 0));
 	jw_put_int(jNode, "Priority", nodeIfNull_int(wd_priority, 0));
+	jw_put_int(jNode, "LifecheckStarted", nodeIfNull_int(lifecheck_started, 0));
 
 	jw_end_element(jNode);
 
@@ -4510,6 +4522,7 @@ standard_packet_processor(WatchdogNode *wdNode, WDPacketData *pkt)
 				wdNode->escalated = tempNode->escalated;
 				wdNode->standby_nodes_count = tempNode->standby_nodes_count;
 				wdNode->quorum_status = tempNode->quorum_status;
+				wdNode->lifecheck_started = tempNode->lifecheck_started;
 
 				print_watchdog_node_info(wdNode);
 
diff --git a/src/watchdog/wd_commands.c b/src/watchdog/wd_commands.c
index 4b313e6c7..ddc13a0fb 100644
--- a/src/watchdog/wd_commands.c
+++ b/src/watchdog/wd_commands.c
@@ -425,6 +425,13 @@ parse_watchdog_node_info_from_wd_node_json(json_value *source)
 				 errdetail("unable to find state")));
 	}
 
+	if (json_get_bool_value_for_key(source, "LifecheckStarted", &wdNodeInfo->lifecheck_started))
+	{
+		ereport(ERROR,
+				(errmsg("invalid json data"),
+				 errdetail("unable to find lifecheckStarted")));
+	}
+
 	return wdNodeInfo;
 
 }
diff --git a/src/watchdog/wd_json_data.c b/src/watchdog/wd_json_data.c
index 91dd26a86..26ebd4b5e 100644
--- a/src/watchdog/wd_json_data.c
+++ b/src/watchdog/wd_json_data.c
@@ -517,6 +517,7 @@ get_watchdog_node_info_json(WatchdogNode *wdNode, char *authkey)
 	jw_put_int(jNode, "QuorumStatus", wdNode->quorum_status);
 	jw_put_int(jNode, "AliveNodeCount", wdNode->standby_nodes_count);
 	jw_put_bool(jNode, "Escalated", wdNode->escalated == 0 ? false : true);
+	jw_put_bool(jNode, "LifecheckStarted", wdNode->lifecheck_started);
 
 	if (authkey)
 		jw_put_string(jNode, "authkey", authkey);
@@ -589,7 +590,8 @@ get_watchdog_node_from_json(char *json_data, int data_len, char **authkey)
 		goto ERROR_EXIT;
 	if (json_get_int_value_for_key(root, "PgpoolNodeId", &wdNode->pgpool_node_id))
 		goto ERROR_EXIT;
-
+	if (json_get_bool_value_for_key(root, "LifecheckStarted", &wdNode->lifecheck_started))
+		goto ERROR_EXIT;
 
 	ptr = json_get_string_value_for_key(root, "NodeName");
 	if (ptr == NULL)
diff --git a/src/watchdog/wd_lifecheck.c b/src/watchdog/wd_lifecheck.c
index a6958a395..77ac48a45 100644
--- a/src/watchdog/wd_lifecheck.c
+++ b/src/watchdog/wd_lifecheck.c
@@ -107,7 +107,7 @@ static int	is_wd_lifecheck_ready(void);
 static int	wd_lifecheck(void);
 static int	wd_ping_pgpool(LifeCheckNode *node, char *password);
 static pid_t fork_lifecheck_child(void);
-
+static bool inform_node_status(LifeCheckNode *node, char *message);
 
 LifeCheckCluster *gslifeCheckCluster = NULL;	/* lives in shared memory */
 
@@ -452,6 +452,9 @@ lifecheck_main(void)
 
 	ereport(LOG,
 			(errmsg("watchdog: lifecheck started")));
+	LifeCheckNode *node = &gslifeCheckCluster->lifeCheckNodes[0];
+	node->nodeState = NODE_LIFECHECK_STARTED;
+	inform_node_status(node, "lifecheck started");
 
 	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
 	{
@@ -547,6 +550,11 @@ inform_node_status(LifeCheckNode *node, char *message)
 		new_status = "NODE ALIVE";
 		node_status = WD_LIFECHECK_NODE_STATUS_ALIVE;
 	}
+	else if (node->nodeState == NODE_LIFECHECK_STARTED)
+	{
+		new_status = "NODE LIFECHECK STARTED";
+		node_status = WD_LIFECHECK_NODE_LIFECHECK_STARTED;
+	}
 	else
 		return false;
 
-- 
2.47.3



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Proposal: Add lifecheck started status to pcp_watchdog_info.
  In-Reply-To: <TY4PR01MB173744EDFBDCE68DDECAE746B9456A@TY4PR01MB17374.jpnprd01.prod.outlook.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox