Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uwNKJ-000xAM-2d for pgpool-general@arkaria.postgresql.org; Wed, 10 Sep 2025 16:05:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uwNKG-002C9F-B4 for pgpool-general@arkaria.postgresql.org; Wed, 10 Sep 2025 16:05:40 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uwNKG-002C97-2B for pgpool-general@lists.postgresql.org; Wed, 10 Sep 2025 16:05:40 +0000 Received: from mail-vi1eur02olkn2068.outbound.protection.outlook.com ([40.92.48.68] helo=EUR02-VI1-obe.outbound.protection.outlook.com) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uwNKC-00027U-2t for pgpool-general@lists.postgresql.org; Wed, 10 Sep 2025 16:05:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=KsJFtbDp1RMa+1EVQ1O/j8TvdiwM+dXFP1xKA9MfbMB3vOOFps5IJKOIg6vgcI04Wx3L6kwnmPSjEHFJjsur3m6tTaIWeMAb5SIcYaPrqHFEVFj9kCGp/zj1/vVkAg5RYxSX2NIA7RIj8ZbWgBWQ2OaP/K0vqIzBgRZm2yOYUjTGUeP/YSKnkxWSrdVxVXJ2IYBvlkgC4E7kOFMOsD4WBm9OHzujT9VGU/4soDUAHgphHUYVC1EMbMT5CyvGSwMbUUjTs1PHg3/G9EYWktY0UMcgxQwZDV8644cQFdwfdf33Bhp/IWpxoCRq5TjZSbuuOa5FWNVvCPzbsSqP5FpVwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6sV/nrclxjcS4gKFtXT80Xv4aHEWb/A5iIvDsczU79o=; b=CbXjBW8XuLvEPug71s6yhiEyhBPcu8m+uMY5Lv8xwg6EwEZyo2vKTqQxG5jnexWxjQO606aDS+p5V3XqPysBk7HOO2VRokn262va1A4VUQfwewCxLXIbtjfaVYQSb7hFjOy6dkRwZUBFbHv8nKegF21OZAPITf8y1D0GIIthUGnQTsc+rqboDhoM9Vh+QAIJI9u8iY7Yg6vr17Nm26YLfVYJn90rpXupy/pDm8b/iwQ75Pthk43FO8+ZI7ToN9yDWyeppWzGt9k64OUWnN5MinY9z4wNAj0n8siEIIEBCXpLG9dNa4KePeZyk4VxWqdE8Sc5PRz6P4WTvKvfbF0DeA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6sV/nrclxjcS4gKFtXT80Xv4aHEWb/A5iIvDsczU79o=; b=H0f9+z8aoDHZR23EHtHxOAc0NuHoi8bTMieW/SC9YoZWv8Up55FGRyeqL80F18CY3A0X2BJz95B5Y47SPQf3ghHCJJ9qELqxYwZiT29jM9zrgdnvdPwaH1jePffgRd+DSpNMMyLEb0XU3UJLTO2koL6VvGf+mhe3Yt+pPDaRcPJnY31c6h14ve0dRKUbxKgiEt/8ga2mUR0DeljSwQuzjcJBd0P9JDYMRgC4nrIH++CGKjCXFwxXNPnlJ96OqDx/pm0LBx7fPvwSEocpB2NYhRTMy4vb5Sp6gEQPiguF7YSnmMSWDbCXB0182eNHZq3xVL2Q35d9eW+0zIX67vguTw== Received: from PR3P195MB1119.EURP195.PROD.OUTLOOK.COM (2603:10a6:102:af::19) by DB9P195MB1801.EURP195.PROD.OUTLOOK.COM (2603:10a6:10:39c::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9094.22; Wed, 10 Sep 2025 16:05:35 +0000 Received: from PR3P195MB1119.EURP195.PROD.OUTLOOK.COM ([fe80::53a9:39bb:ae39:da9b]) by PR3P195MB1119.EURP195.PROD.OUTLOOK.COM ([fe80::53a9:39bb:ae39:da9b%5]) with mapi id 15.20.9094.021; Wed, 10 Sep 2025 16:05:35 +0000 From: Nisrine Abdou To: "pgpool-general@lists.postgresql.org" Subject: Pgpool-II - Tcp session time out between standby nodes Thread-Topic: Pgpool-II - Tcp session time out between standby nodes Thread-Index: AQHcImuVBmw97exugEuNX37cX+smuA== Date: Wed, 10 Sep 2025 16:05:35 +0000 Message-ID: Accept-Language: en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-reactions: allow x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PR3P195MB1119:EE_|DB9P195MB1801:EE_ x-ms-office365-filtering-correlation-id: 0550d918-c6d1-4132-078f-08ddf083e097 x-microsoft-antispam: BCL:0;ARA:14566002|8062599012|15080799012|13031999003|8060799015|461199028|31061999003|19110799012|14030799003|39105399003|51005399003|3412199025|440099028|40105399003|102099032; x-microsoft-antispam-message-info: =?us-ascii?Q?fM8RVmGlvdsfoXvoWeC9cgEUB/gi5wDBWLjIUlLs/4XoQcPnSA9KHM6vj6cu?= =?us-ascii?Q?VhSYiIQja9MIwxxn+0+btzRUVxqIpMH2Q/wLA4IXaDyLp6B24De6E7NyHNn2?= =?us-ascii?Q?HRZi8gZLq8QQoRVD1qrjCjHi+AzJhe7jTL92eWdma0vIpb8xHX7bn7QxdUWu?= =?us-ascii?Q?788kWWBW/xfWycq+i0oi/kkz6abKkiMGUFDrdzKbSzsQXmfbSWwf9LcncDpJ?= =?us-ascii?Q?Ot0EzO1Lo4Zd+Hyn8fBdZQqw6xXd/lKQKTCfudBHUZ3F6PLBTxSBOo7o9Axs?= =?us-ascii?Q?9GJZXM/ytB5My0sSlQoqb3+wo5kFck77WbhR8oH0VhPgkIYIySW8rHMgzz8o?= =?us-ascii?Q?Y3mAGUl9/rN2kJN4kFJZH/7nNQiP71x4LAQ3Wz2VGOxNaPD+vaxhxiQQOQlf?= =?us-ascii?Q?hyS9W0TvjXQvQ7mnhJDlfCqbs18igFA6wDbxZHgYHFimJIw1uvv2K7aGx93z?= =?us-ascii?Q?SFwCEOWdT4Ca5XCpGqkO+nu8FNuuZrFD9XXSTH6raIvhe9A3qbTpolXGKj8y?= =?us-ascii?Q?g5pJGcn05oi7ILwnfeMuY36hYUiFG7D2WMID8LBY8+3uSKZn78QljhmcPVK2?= =?us-ascii?Q?mAsdFhjRXWTSzhwqK3zOjG176R2rEisy+pnV+GTEZUlkz3ERWF8BCu5Nv32H?= =?us-ascii?Q?DXwo04ItlJFRDPFlXnESIUXLl40H1Jtcqhh8cGtTtqYpcob6N+ltOArDKz7W?= =?us-ascii?Q?so6oiPt2BaFZyZqR7RPQ7VTVQX5RL7rLrBvQzwUwjc8RRcLkp8xfOYVyLISW?= =?us-ascii?Q?538sbY0GWUsufA3P18f3v3TGPcRGY8/gG53kB5KLDySANJPfpat0GbfD9z7l?= =?us-ascii?Q?Wn3fCjiTXsw9OKdycznyn7EwzRiy1zhsKqQHcda1EAiCegHwGPgqh4n/KFXk?= =?us-ascii?Q?aosQX8B6bk/TslHPwEaroyW+zNjHmBx1tHiJYxISgog2ylpDUVVUWsv8eHba?= =?us-ascii?Q?rXw390tz0VMAxNacgIOnzaslv6YO5jYhZpX0fMT2nRHN8Ps2K0Us+kzDmkIy?= =?us-ascii?Q?PYcAsyamrdm+qNSk+WyX4JZB3HfTshiQdBsdI9YtztkfupkSdIj2liMwQ1Cw?= =?us-ascii?Q?QMOZXI9gF12XL44vrxzf0jguxs8iIfO2XYKPi17gdoowA9+EDCM1gmLPC/En?= =?us-ascii?Q?NQlBmYcsTUqbMD+f7s5q82TtkmSgDrWr6Eb6XE9O65jT9X7aGhIawgtvCGHM?= =?us-ascii?Q?w+w0KvzqbjMaPJtfRRcjunb69nhq7C0pTx7MtpazuUsiMVTh/CQl8jr9Tzjq?= =?us-ascii?Q?Sf/vhhfaxAuWfg80wQGmnuzyVK+GDOxkF/zld3wXQQ=3D=3D?= x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?4d/ZwJcjZ5X/LEMOW/CBiu3MxblyEze6UsukpqPPshMSCzIz2SQKeOfSnuTF?= =?us-ascii?Q?bzNAp8Kh2pExaAZKQbLCL0+rpTOR7rayMN/eGsveCSj1s3lX11gV3ja+opY6?= =?us-ascii?Q?vqQohPFvpQkn5/AtyPwX0h3stUeP1Sv6gcWkUUdiV24V3lEj5yc+dsRuYdem?= =?us-ascii?Q?PNrJfuqcvmIDKQSU8qHTAubzUgGFY2bknAIE7Nm00K0JWdxtnkDEyjJm9SmK?= =?us-ascii?Q?twdTrRBSlGeSBFi4HEBVD3szEys8487bBfQl0bbbE5YtEXRWSj5rMB0aK6Uo?= =?us-ascii?Q?zqC7eXJMQMyWXa91JfoFL8iqu1qj9IMmYu3/cIn+ZPr2gjdENe9S8LRHHM96?= =?us-ascii?Q?SKq9vNY9oIWph9HTwwBXpXJBSS7wQ/V6aeKQdZ0AOMtZqD8jmD06iiDkecH0?= =?us-ascii?Q?hjka736IXk7CUPtl/I3+faVzQc++yvxZDphH8kr03K/6BGRv3wgVT8VYnlFa?= =?us-ascii?Q?7hKxfPEEC8klFQEsHBSDdX2fOHtQjPytiRzgRDVQClxI4fJvh94gRjguvPj0?= =?us-ascii?Q?h7GT+W8puhvxkZ+9DJe2GAyWa8TvG/pVH4GAOVmWRrSn5lxf5ArEUYNYwpt5?= =?us-ascii?Q?EzIF5bsIdXwR+bddhwT8nypnd58Rg/84IJUDEwg90Vel5cuO9MJmAUvWeqvZ?= =?us-ascii?Q?1yhXBqAR7Oo2qZwYEV6Z8TnIRsDicSrQQto/HItu5x1w629UqFzPpQfKQScI?= =?us-ascii?Q?YQ+dwPqDTjh7E/MuRrfxsV0lB7+SvqBeUxCbt+EMQ0as0IvaQP1Kas3YPcPT?= =?us-ascii?Q?DLtdpeRNMY9LSDtBjNcLXJCoqNtTjiBE5OHofo64KSofYy/z4cNhmp95rlfs?= =?us-ascii?Q?tXqwQeOKSPHwo8+IrA2oZJkT8KqhK/GEbtO5UHjrfdz4WKyrGdeYlinbKDFI?= =?us-ascii?Q?6wzCG/8G2z9U3BTok2kmFZlKd3mbhzzDrOK4LQwnw+uLNnEIUBO4wm+zZGqm?= =?us-ascii?Q?o2/rD5mfmmxRFJOjdDMRoyK7axXX+6fioyGM5PIV/8xfMTNFfdgp7cjgk9/r?= =?us-ascii?Q?zHqWwsPzuQgUs2CDulyFS2EEmCRBAAzKj7phCavn205/2IURBi1zqZmAC3Hk?= =?us-ascii?Q?wOi6FImw2APbXjhUcyifmKR0hs0XAsyh9gFu9C7rZXTulzRdBAWcUbktDsHw?= =?us-ascii?Q?G/yKYxqOu9EqZeMA8qDbe7suBo/eQYhQh7HfuooqGhXermCvu+/Z9NaelAvD?= =?us-ascii?Q?2cN60OU0/XCnq9w24OO0sXYM1n1+9+klnX73R5BLbg++WzN6mlx8eOEuQje8?= =?us-ascii?Q?xKKUi8KguPMQfFzBgqxHykGpDF+4DWu1q3mQSi4BmCws38jRgSg3E9GmoQJN?= =?us-ascii?Q?fn7tywuCxhu7MUvAFy+sky2WN6GkLUgWkcV158vjQ54J2g=3D=3D?= Content-Type: multipart/alternative; boundary="_000_PR3P195MB1119E18F000D0B73D87095A7F80EAPR3P195MB1119EURP_" MIME-Version: 1.0 X-OriginatorOrg: sct-15-20-8534-20-msonline-outlook-87dd8.templateTenant X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PR3P195MB1119.EURP195.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 0550d918-c6d1-4132-078f-08ddf083e097 X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Sep 2025 16:05:35.5820 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9P195MB1801 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --_000_PR3P195MB1119E18F000D0B73D87095A7F80EAPR3P195MB1119EURP_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi all, I'm new here, and i'm no network/linux expert, so please bear with me :) We have an issue on our Pgpool-II cluster where tcp sessions between standb= y nodes are timed out on the Firewall, but not dropped on the servers. This is caused by the fact that the system's tcp_keepalive_time parameter i= s greater than the timeout configured on the firewall. Hence, the standby nodes realize that the tcp connection between them is lo= st only when the system sends out its keepalive probe, which is too late. This results in the following reoccurring messages in the Pgpool-II log fil= es: LOG: read from socket failed DETAIL: Connection timed out LOG: client socket of dns:port Linux is closed LOG: new outbound connection to dns:port For info, it's a 3-node cluster on 3 different sites. So, when this happens on a "normal day", it has no impact on the service. But when this occurs in the middle of a failover (after losing the Master P= gpool node for instance) during the election of the new Master, we end up i= n a split-brain situation, caused by the lost connection between the 2 stan= dby nodes. The cluster then shuts down since the Quorum is no longer met. So, my questions are: 1- is there any way to maintain the client socket active and alive between = the standby nodes? 2- is there a tcp_keepalive configuration on Pgpool-II side? Or should we m= odify the system's default configuration (which is now tcp_keepalive_time = =3D 7200)? 3- Could you please give your insight on the impacts if we modify the tcp_k= eepalive system parameters (tcp_keepalive_time, tcp_keepalive_intvl and tcp= _keepalive_probes) in a way that keepalive probes are sent in less than an = hour time (timeout configured on the firewall is 60 mn)? Please advise. Best Regards, nissabissa --_000_PR3P195MB1119E18F000D0B73D87095A7F80EAPR3P195MB1119EURP_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi all,

 

I'm new here, and i'm no network/linux expert, so plea= se bear with me :)

 

We have an issue on our Pgpool-II cluster where tcp se= ssions between standby nodes are timed out on the Firewall, but not dropped on the servers.

This is caused by the fact that the system's tcp_keepa= live_time parameter is greater than the timeout configured on the firewall.

Hence, the standby nodes realize that the tcp connecti= on between them is lost only when the system sends out its keepalive probe, which is too late.

This results in the following reoccurring messages in = the Pgpool-II log files:

 

LOG:  read from socket failed

DETAIL:  Connection timed out

LOG:  client socket of dns:port Linux is closed

LOG:  new outbound connection to dns:port

 

For info, it's a 3-node cluster on 3 different sites.<= /span>

So, when this happens on a "normal day", it = has no impact on the service.

But when this occurs in the middle of a failover (afte= r losing the Master Pgpool node for instance) during the election of the new Master, we end up in a split-brain situatio= n, caused by the lost connection between the 2 standby nodes.

The cluster then shuts down since the Quorum is no lon= ger met. 

 

So, my questions are:

1- is there any way to maintain the client socket acti= ve and alive between the standby nodes?

2- is there a tcp_keepalive configuration on Pgpool-II= side? Or should we modify the system's default configuration (which is now tcp_keepalive_time =3D 7200)?

3- Could you please give your insight on the impacts i= f we modify the tcp_keepalive system parameters (tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes) in a wa= y that keepalive probes are sent in less than an hour time (timeout configu= red on the firewall is 60 mn)?

 

Please advise.

 

Best Regards,

nissabissa



--_000_PR3P195MB1119E18F000D0B73D87095A7F80EAPR3P195MB1119EURP_--