Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5CTA-00306l-0b for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 00:51:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5CT7-00A16a-1L for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 00:51:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5CT7-00A16S-0N for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 00:51:33 +0000 Received: from mail-oo1-xc35.google.com ([2607:f8b0:4864:20::c35]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w5CT5-00000000yMb-0Njf for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 00:51:33 +0000 Received: by mail-oo1-xc35.google.com with SMTP id 006d021491bc7-67df7469b14so577647eaf.1 for ; Tue, 24 Mar 2026 17:51:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774399890; cv=none; d=google.com; s=arc-20240605; b=EdAL83zcMUymjOB9V3bRcgmt3MDCWsBVWMyBUijFr3QLalQovucTIGhc+a9UNzEZLE ZeVGeF3rctjs5DGfONDiCnQaE0l8l6/8HdEc9r8NqMuBlKncVE8rj6i4KwCIXWn8uTtJ u3YqwD2Aewckm2/kEFK6q4U46NREw7KotCnt/TAqtcrFuwgZexF+cHHuheF3wc+N2uXy P/d5FvEL+ma4Je+LjVW5o4QCHgyBa36/J5bhniezFUQqb4K6InMdC4OUyltao1cRye/O PNMNbBczRBU0w6ekhmdBChJtRCZxmj0n0nN9a1W947YRojyNzx03JrkSd2vLDPETf272 0rKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=K5ptSpG1HRPXroBD7hnDEy69bEva9w7jRPK5Gqa8NPQ=; fh=KiY9PDu2j+G0AqEzaJsBGCODk+inUKw/vODzjp1hfJk=; b=EK07Q5Z5ALxYJvsKCMQpgDRbU18HJROrhFCGG4zwGPVsx1kDy9GXkmavy13wWpy1G4 x8syBXvjRdIAgZPUU8IrbjPbIe8Ya1b88Z3x0G7d17TdY70nFS8FWOs2iIi+GvKaXLmA sw8xV0QDZqWGZCvrzwIp3K1X/b2tckZnThOZPWbEMycKjtjBXJnV2I6Y8HWRGz25SbON zVi/YOZCkewIzKH1opmiMbjQf1gcNU1fCatGcm5DPhoRquHfyuaoiW1526R/kplal3Hp rxuMlvCzfnlmu3NMuJc6Q9k+fDEbRZ/jrskYebJm0Qvq2AOeEUkcih5Z596zM1zHyrgA +/3A==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774399890; x=1775004690; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=K5ptSpG1HRPXroBD7hnDEy69bEva9w7jRPK5Gqa8NPQ=; b=EOXklohOEsmOrBzZPLx48tX27Hf6e78R26/i7IVrG3MwX+SR2dt+bi0nft611kjrYI o9ZY8EPh/7g9OO/5zSKMLIP4hfJwiDx3RB+lv55ke45kg0R/0JIDCywCbvvEr5bRnDLL GgR/f8/5hUeWfCtHFfxla4k+00hESW8A51cXKpkTN060td9+wI8WIZ54MKyzkHdkPlya k2Mrs1zHLpMB7vaD/dTyS8mFoywjWXPWORTaG17EXKfzhZJAyp2AaDyxqW/XTqKYU+L5 LdToGQRd1O25HzTl4UlZE11qerLdhFoXtHzLDqzU9RlWqTb7gZyRjj+GgbYGclFdeHBI hljw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774399890; x=1775004690; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=K5ptSpG1HRPXroBD7hnDEy69bEva9w7jRPK5Gqa8NPQ=; b=QDdPuvBXiteORhd0ABZsyY5IiQMuEobrHoXTHQNTrSq5+RsuvHR5nlazJDVnrfshcI 2TQiHSY9AC8K77VW2pFQhjR1IKEfM0mK9YDD31ov+BhDieOG6LWRed3gVA7zmm6ucsAV qDP2z7QeI6Xp7MztgduYjIVdRCh+tZks2IKIYRbAwc7ShTxddZuS1TBeajNdKa4uv5+F gBjedoky5WbjgWJ4zTa3EBXvTnDdxR8d/0UmPXdTRqN/vOMyZzDnMgGktsCbshgqjy+n G288w0ZwusAa7aiLJZGtWdSy35oMCLl7y64H1gYw1RQKkh1+OQokA35OJxNWkUPF/pFC v71Q== X-Forwarded-Encrypted: i=1; AJvYcCWhYMIMAN9AuNLNgy+1tn2uPiRlaQ08XwZhlhNzFSKMnPW7YdbnVSvuHX/aCOpGJh+WHumSH03dnGfxcQOt@lists.postgresql.org X-Gm-Message-State: AOJu0Yzl0ym44rb0GHN5pIKZK5NdGiWi+iky+sJT/92+FbWEZDOjiEfs rO89pKx9mDbFBGiQ+KnRVF8WBr610STUzvG13YFMCEEBpgE0hdFnz+7/GrZgB+2fKyW91W1EwrD 9SdUn+/pNgsJ2hna1w+/SBdUA6jpL7uQ= X-Gm-Gg: ATEYQzx+LTIA2TVzpk5CspnfjQLtiO9j0ebuy3b6r8/UKRmHaihCZR1JD3s2eeK/IMw MN0kPrQCbZS3ojdgWs3/osyhXlVlHiCthpWaCdcqO0BWZGL661K+ML4dJB10/0hr12NCvMF0MVC CeErRNGIbM/Wenm9Kx2giDiZxwxzxOUZIlMljvcgGBbAcOMHm+Ls+ta5HYHu95s1mVunhSfPnvl uJ3zwORvjJMGxhKYUAfpJW8dA66QjV4nFdUNaN9+ewlqoS2aGTrjZ6HbQ8wF0eVU6PYgGc0H+vg J6n+SrJ3J8IVrJnqjsx4Pc+QNH2dL2zp3yXso1IePQ== X-Received: by 2002:a05:6820:3094:b0:67d:bef4:717b with SMTP id 006d021491bc7-67dff606bb5mr1017853eaf.64.1774399889763; Tue, 24 Mar 2026 17:51:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fujii Masao Date: Wed, 25 Mar 2026 09:51:17 +0900 X-Gm-Features: AQROBzADkAyE5eehguBeuYI6MFdjSNKjmXio40C4tIPNqEs5t7fKkqFHgQgExi8 Message-ID: Subject: Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? To: Nisha Moond Cc: Amit Kapila , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Mar 25, 2026 at 1:51=E2=80=AFAM Nisha Moond wrote: > Thank you, Fujii-san, for sharing the steps. I am now able to > reproduce the behavior where promotion gets stuck because the slot > sync worker remains in a wait loop. Thanks for the test! > As an experiment, I tried setting tcp_user_timeout to 7000 / 15000 > (using slightly higher values for debugging). With this setting, the > TCP stack terminates the connection if data sent to the primary > remains unacknowledged beyond the configured timeout (e.g., due to a > network drop). In such cases the slot sync worker exits instead of > waiting indefinitely. With an appropriately tuned timeout, this could > help avoid the promotion issue by ensuring the worker does not remain > stuck when the connection to the primary is lost. Yes, TCP timeout settings like tcp_user_timeout, keepalives, and net.ipv4.tcp_retries2 can help in this situation. However, they involve a trade-off: using very small timeouts can reduce failover time but increases the risk of false network failure detection, while larger timeouts (e.g., 10s) avoid false positives but can delay failover by that amount. Because of this, I think it's better to address the issue without relying on such TCP timeout parameters. Also, tcp_user_timeout is not available on platforms that don't support TCP_USER_TIMEOUT (e.g., Windows). Regards, --=20 Fujii Masao