Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4sxc-002eeH-36 for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Mar 2026 04:01:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w4sxb-0041Hf-10 for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Mar 2026 04:01:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4sxa-0041HV-32 for pgsql-hackers@lists.postgresql.org; Tue, 24 Mar 2026 04:01:43 +0000 Received: from mail-lj1-x233.google.com ([2a00:1450:4864:20::233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w4sxY-00000000oNp-2fyO for pgsql-hackers@lists.postgresql.org; Tue, 24 Mar 2026 04:01:43 +0000 Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-38be5e86918so7338381fa.3 for ; Mon, 23 Mar 2026 21:01:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774324900; cv=none; d=google.com; s=arc-20240605; b=L9XXS6AkqC7nYDkuTnOePzNpx9vb4uIrhJ3mkKMJIsp4sgDXh4e1BS+w7ToFP/L5jK hnpR1Ia++kxZH7894cyiSzLNBl9DdQ9wrYAXAsxhRFnuR+R3EXXbXsF1zMr13XyrU53Z sRTigzj70P3iBT1bxmJu0ROT3H6ccmAeZrLOtAiFSKx983Ohger5Edw0M7McnR/ASVul uj6xPFIipkk7Vx9EV3oYgA3tOuFXDUR04LfisjgvC4F/6+ORlAZAcVtu0ZT7ArXpxOci 0Xp5wVbafoSozlnSh0+tL4MzJ6moWJKOwhBSybE1N91UcGCRKfet6ZBWRfIV7B/f1MOu W73Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=QjT8UP1WCWJxn835oKbEPED96sX+ZaJJUBdZEsN+U9M=; fh=s+urbBROqu5QSp2salF5ppBzNjKI2o4mbCsFvwdgJek=; b=XEK1WhNzwXuJjpPFM/M1OQCX75cMf+jXCO71yDmMjBZEM0eB2//4zHpew+0xrF5pBz /NiU+vYYBVth+s78enqta0k8bDxezgYbSP9Vv1FrRETynizGKY7VQs4JJ0+RkyOg/YBK T0YzR8gsfhvOPvNHfMyekTGexBrXP6YCYPEX3PUPmKA15E087cth22I5ptqq3B7FKAzO 6IROPE2hCGleP6vjoDLKE8qk9UTCqYJDyYNgXPX6PUvRuQRYB9DaWhq8rw/gbbWiHoXI FwHusrn7kENeyNdlO2UhqJTbabQmxGS78Y29+HJSGo6eQ++epgWtEtTq3GweWwHwi+5a SCKA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774324900; x=1774929700; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QjT8UP1WCWJxn835oKbEPED96sX+ZaJJUBdZEsN+U9M=; b=dph5upb+fAjPXUSNBdXDrJTK0SloPJZYwv4xqF21bjf9qf//RaCtQUiz4qFdaKMATS ETJGvYJT0ipiK3dEfa3PiTjLb26HR8afsHuuCrTF97+DPzBLU4tcJpQP9gyehsy2G61m xIGxo5/aTVe2iy1B04qLoimdLyFF+fb9E1YIAoqawW1Z8UGe0ZB+/zinhY4DKyhmYZxj C7BO8ANnkuherg/goZiUa5egwnAQier0zPSTCEGVRlaWPPJQ1tPuo6u8kq7Yf6pkJPBH fN5qUi1LkXhguA7ZcDlDNOWFhTRikjqN7UsVEov7uNZXI8r+wDXnok5kvPA4VYbqMK2z movg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774324900; x=1774929700; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QjT8UP1WCWJxn835oKbEPED96sX+ZaJJUBdZEsN+U9M=; b=Fz7ifJCVEexCRXXCKpIYnUyrsn4wXQ8P1rI4N0i/mR17aGpBG01lRl8ccZ4rlYHJhg dBRaq7UArDleivhpX7C+BNll/ZV3eiFNA6OEr4gwqpjV23Fujv2W1z1E2e/R2P8U/yMy PMgH/YhuyBk4nYkX6HFDrVfkwk2+MY+HZwxzN8s9ubZDcvBPpRSD6PpqY7vw45/Oh8AH ULkNzTm3qJKXs4LLq8Gijtq6y/fEDW/oNaWvXzxCr56DcQYjseptWb10eaI0+iA8ze+O f7e45FRqLYxXoEzLe6BoLzjgbryw647VwtFL+YqmcUQ7oDF8OKMAj4TLo+Vos72PqHBi U1xw== X-Forwarded-Encrypted: i=1; AJvYcCUb+W+rjuacsxjX2OI7IUZ2ao9angZBROS340D6tmv8ySLKMP5+yfDkBOF7Fz5X2LZEAAvH5ZL19P/wjjhL@lists.postgresql.org X-Gm-Message-State: AOJu0Yw2pyUnO+OUokcQqrPIZnqL4C55id7Y7xIKEfK5GqIPeoWdzhQx PlteaEEaJeRIHaSqd3kwz9YfmZT2Yvki97vRFJSymkPdCn+cOXUe7gSCCvWwf5UQKiAm0ji1/M8 vLleFAeAJyJhnM/FfjJneXy4y8zBUHA== X-Gm-Gg: ATEYQzwzQYIhVNEQNo3MkRWliixjwexNTbLNx7UtXNsVtudbiiHNbWj7cyFGpx2S/+G hXjWlWOPNyrrScJ0cmDupYQO2Antt1qnsbnHWnuP5IHlr9fxxnQ1ITEAK5znmpRt4Yfqbc0l/Sz yiQI6+OsRKFtCrm65lg2yBlIUDl2bptgn/0V7drnhfphflT+9rmeTKeIYMfOecjm55nXaiMmrVs Ybvq+YaVOyFocEYof58Y+Z5AHdkw8PnQDAXdREZiZ0tUZkhPY/I4hUGc4V2t4+9tYN7dxGy/mgZ V5au+9LhLU0onw== X-Received: by 2002:a05:651c:1698:b0:38a:a7b4:15e8 with SMTP id 38308e7fff4ca-38bf963f40emr41082821fa.12.1774324899421; Mon, 23 Mar 2026 21:01:39 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Nisha Moond Date: Tue, 24 Mar 2026 09:31:28 +0530 X-Gm-Features: AaiRm50c18lWF9ighLV0TmkCKcGJYYalINylJ69mxw0QllkCZ8gWPvBDBSPwzpw Message-ID: Subject: Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? To: Fujii Masao Cc: Amit Kapila , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Mar 23, 2026 at 11:21=E2=80=AFAM Fujii Masao wrote: > > On Sun, Mar 22, 2026 at 1:52=E2=80=AFAM Amit Kapila wrote: > > > > On Wed, Mar 18, 2026 at 9:35=E2=80=AFPM Fujii Masao wrote: > > > > > > I noticed that during standby promotion the startup process sends SIG= USR1 to > > > the slotsync worker to make it exit. Is there a reason for using SIGU= SR1? > > > > > > > IIRC, this same signal is used for both the backend executing > > pg_sync_replication_slots() and slotsync worker. We want the worker to > > exit and error_out backend. Using SIGTERM for backend could result in > > its exit. > > Why do we want the backend running pg_sync_replication_slots() to throw > an error here, rather than just exit? If emitting an error is really requ= ired, > another option would be to store the process type in SlotSyncCtx and send > different signals accordingly, for example, SIGTERM for the slotsync work= er > and another signal for a backend. But it seems simpler and sufficient to = have > the backend exit in this case as well. > > > > Also, we want the last slotsync cycle to complete before > > promotion so that chances of subscribers that do failover/switchover > > to new primary has better chances of finding failover slots > > sync-ready. > > I'm not sure how much this behavior helps in failover/switchover scenario= s. > But the main issue is that if a primary crash triggers standby promotion, > that last slotsync cycle can get stuck waiting for input from the primary= , > which delays promotion. IOW, failover time can become unnecessarily long > due to the slotsync worker. I'd like to address that problem. > Hi Fujii-san, I tried reproducing the wait scenario as you mentioned, but could not reproduce it. Steps I followed: 1) Place a debugger in the slotsync worker and hold it at fetch_remote_slots() ... -> libpqsrv_get_result() 2) Kill the primary. 3) Triggered promotion of the standby and release debugger from slotsync wo= rker. The slot sync worker stops when the promotion is triggered and then restarts, but fails to connect to the primary. The promotion happens immediately. ``` LOG: received promote request LOG: redo done at 0/0301AD40 system usage: CPU: user: 0.00 s, system: 0.02 s, elapsed: 4574.89 s LOG: last completed transaction was at log time 2026-03-23 17:13:15.782313+05:30 LOG: replication slot synchronization worker will stop because promotion is triggered LOG: slot sync worker started ERROR: synchronization worker "slotsync worker" could not connect to the primary server: connection to server at "127.0.0.1", port 9933 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? ``` I=E2=80=99ll debug this further to understand it better. In the meantime, please let me know if I=E2=80=99m missing any step, or if = you followed a specific setup/script to reproduce this scenario. -- Thanks, Nisha