Re: Pgpool-II - Tcp session time out between standby nodes

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Bo Peng <[email protected]>
To: Nisrine Abdou <[email protected]>
To: [email protected] <[email protected]>
Subject: Re: Pgpool-II - Tcp session time out between standby nodes
Date: Wed, 17 Sep 2025 23:18:38 +0000
Message-ID: <TYWP286MB263382CB076C0BA450BB4403F217A@TYWP286MB2633.JPNP286.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <PR3P195MB1119E18F000D0B73D87095A7F80EA@PR3P195MB1119.EURP195.PROD.OUTLOOK.COM>
References: <PR3P195MB1119E18F000D0B73D87095A7F80EA@PR3P195MB1119.EURP195.PROD.OUTLOOK.COM>

Hi,

Pgpool-II (watchdog) periodically connects to other Pgpool-II nodes,
so watchdog process should not remain in an idle state.

In order to confirm whether the logs you provided is actually an error message output by Pgpool-II,
could you please share the full log messages?

---
Bo Peng <[email protected]>
SRA OSS K.K.
TEL: 03-5979-2701 FAX: 03-5979-2702
Mobile: 080-7752-0749
URL: https://www.sraoss.co.jp/

________________________________________
差出人: Nisrine Abdou <[email protected]>
送信: 2025 年 9 月 11 日 (木曜日) 1:05
宛先: [email protected] <[email protected]>
件名: Pgpool-II - Tcp session time out between standby nodes

Hi all,

I'm new here, and i'm no network/linux expert, so please bear with me :)

We have an issue on our Pgpool-II cluster where tcp sessions between standby nodes are timed out on the Firewall, but not dropped on the servers.
This is caused by the fact that the system's tcp_keepalive_time parameter is greater than the timeout configured on the firewall.
Hence, the standby nodes realize that the tcp connection between them is lost only when the system sends out its keepalive probe, which is too late.
This results in the following reoccurring messages in the Pgpool-II log files:

LOG:  read from socket failed
DETAIL:  Connection timed out
LOG:  client socket of dns:port Linux is closed
LOG:  new outbound connection to dns:port

For info, it's a 3-node cluster on 3 different sites.
So, when this happens on a "normal day", it has no impact on the service.
But when this occurs in the middle of a failover (after losing the Master Pgpool node for instance) during the election of the new Master, we end up in a split-brain situation, caused by the lost connection between the 2 standby nodes.
The cluster then shuts down since the Quorum is no longer met. 

So, my questions are:
1- is there any way to maintain the client socket active and alive between the standby nodes?
2- is there a tcp_keepalive configuration on Pgpool-II side? Or should we modify the system's default configuration (which is now tcp_keepalive_time = 7200)?
3- Could you please give your insight on the impacts if we modify the tcp_keepalive system parameters (tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes) in a way that keepalive probes are sent in less than an hour time (timeout configured on the firewall is 60 mn)?

Please advise.

Best Regards,
nissabissa

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Pgpool-II - Tcp session time out between standby nodes
  In-Reply-To: <TYWP286MB263382CB076C0BA450BB4403F217A@TYWP286MB2633.JPNP286.PROD.OUTLOOK.COM>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox