Pgpool-II - Tcp session time out between standby nodes

public inbox for [email protected]  
help / color / mirror / Atom feed

Pgpool-II - Tcp session time out between standby nodes
2+ messages / 2 participants
[nested] [flat]

* Pgpool-II - Tcp session time out between standby nodes
@ 2025-09-10 16:05 Nisrine Abdou <[email protected]>
  2025-09-17 23:18 ` Re: Pgpool-II - Tcp session time out between standby nodes Bo Peng <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Nisrine Abdou @ 2025-09-10 16:05 UTC (permalink / raw)
  To: [email protected] <[email protected]>

Hi all,

I'm new here, and i'm no network/linux expert, so please bear with me :)

We have an issue on our Pgpool-II cluster where tcp sessions between standby nodes are timed out on the Firewall, but not dropped on the servers.
This is caused by the fact that the system's tcp_keepalive_time parameter is greater than the timeout configured on the firewall.
Hence, the standby nodes realize that the tcp connection between them is lost only when the system sends out its keepalive probe, which is too late.
This results in the following reoccurring messages in the Pgpool-II log files:

LOG:  read from socket failed
DETAIL:  Connection timed out
LOG:  client socket of dns:port Linux is closed
LOG:  new outbound connection to dns:port

For info, it's a 3-node cluster on 3 different sites.
So, when this happens on a "normal day", it has no impact on the service.
But when this occurs in the middle of a failover (after losing the Master Pgpool node for instance) during the election of the new Master, we end up in a split-brain situation, caused by the lost connection between the 2 standby nodes.
The cluster then shuts down since the Quorum is no longer met.

So, my questions are:
1- is there any way to maintain the client socket active and alive between the standby nodes?
2- is there a tcp_keepalive configuration on Pgpool-II side? Or should we modify the system's default configuration (which is now tcp_keepalive_time = 7200)?
3- Could you please give your insight on the impacts if we modify the tcp_keepalive system parameters (tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes) in a way that keepalive probes are sent in less than an hour time (timeout configured on the firewall is 60 mn)?

Please advise.

Best Regards,
nissabissa

^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Pgpool-II - Tcp session time out between standby nodes
  2025-09-10 16:05 Pgpool-II - Tcp session time out between standby nodes Nisrine Abdou <[email protected]>
@ 2025-09-17 23:18 ` Bo Peng <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Bo Peng @ 2025-09-17 23:18 UTC (permalink / raw)
  To: Nisrine Abdou <[email protected]>; [email protected] <[email protected]>

Hi,

Pgpool-II (watchdog) periodically connects to other Pgpool-II nodes,
so watchdog process should not remain in an idle state.

In order to confirm whether the logs you provided is actually an error message output by Pgpool-II,
could you please share the full log messages?

---
Bo Peng <[email protected]>
SRA OSS K.K.
TEL: 03-5979-2701 FAX: 03-5979-2702
Mobile: 080-7752-0749
URL: https://www.sraoss.co.jp/

________________________________________
差出人: Nisrine Abdou <[email protected]>
送信: 2025 年 9 月 11 日 (木曜日) 1:05
宛先: [email protected] <[email protected]>
件名: Pgpool-II - Tcp session time out between standby nodes

Hi all,

I'm new here, and i'm no network/linux expert, so please bear with me :)

We have an issue on our Pgpool-II cluster where tcp sessions between standby nodes are timed out on the Firewall, but not dropped on the servers.
This is caused by the fact that the system's tcp_keepalive_time parameter is greater than the timeout configured on the firewall.
Hence, the standby nodes realize that the tcp connection between them is lost only when the system sends out its keepalive probe, which is too late.
This results in the following reoccurring messages in the Pgpool-II log files:

LOG:  read from socket failed
DETAIL:  Connection timed out
LOG:  client socket of dns:port Linux is closed
LOG:  new outbound connection to dns:port

For info, it's a 3-node cluster on 3 different sites.
So, when this happens on a "normal day", it has no impact on the service.
But when this occurs in the middle of a failover (after losing the Master Pgpool node for instance) during the election of the new Master, we end up in a split-brain situation, caused by the lost connection between the 2 standby nodes.
The cluster then shuts down since the Quorum is no longer met. 

So, my questions are:
1- is there any way to maintain the client socket active and alive between the standby nodes?
2- is there a tcp_keepalive configuration on Pgpool-II side? Or should we modify the system's default configuration (which is now tcp_keepalive_time = 7200)?
3- Could you please give your insight on the impacts if we modify the tcp_keepalive system parameters (tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes) in a way that keepalive probes are sent in less than an hour time (timeout configured on the firewall is 60 mn)?

Please advise.

Best Regards,
nissabissa

^ permalink  raw  reply  [nested|flat] 2+ messages in thread

end of thread, other threads:[~2025-09-17 23:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-10 16:05 Pgpool-II - Tcp session time out between standby nodes Nisrine Abdou <[email protected]>
2025-09-17 23:18 ` Bo Peng <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox