Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5yLa-003omH-2e for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 03:58:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5yLZ-007RSY-0y for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 03:58:57 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5yLY-007RSP-3D for pgsql-hackers@lists.postgresql.org; Fri, 27 Mar 2026 03:58:57 +0000 Received: from mail-pl1-x62d.google.com ([2607:f8b0:4864:20::62d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w5yLW-00000001PTe-3vrW for pgsql-hackers@lists.postgresql.org; Fri, 27 Mar 2026 03:58:57 +0000 Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-2aaf59c4f7cso7278385ad.1 for ; Thu, 26 Mar 2026 20:58:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774583933; cv=none; d=google.com; s=arc-20240605; b=hMvgTAJqtcy7LMmj17zgKQXTvJGiAVHQ3GDX2vc5pa8OQaa898XNQv73EmMOQFxpyI wsVdQ4EtUMHmrACZ74DjgB/q5KG5cLCYS2fzKQWFv7hJdgcu2QLbPTuEKqh650gKcrZO 42GgtfISkEryZ3SLdA1qeyan5NLTwTCVZTTiJN8tk/h0fTzGvQ6fpKRItfevL10oFr1c p9w6DVV9FK98mKQzOv7/3KwzV/0HGW9fSbEVjt1ghxVZgw/CumJ5rQSFe2RCYyIIWrb3 t2UTzqq5wzDg68kpt3wqc0lGyNOTyNXtaORGNB3DHHQgi6XYBfRErmXVd0zXBDeiTWbE nUvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=apoxNiA/eaa+k+7fdSb7UYtJMKTzXzOuPZi9n0D6RkU=; fh=qyOkYkFJYESGwaqC0qsTdbnmCmJX8SBxEhumgpzwKL8=; b=VGqhCIABgxCqhaoGI0+tRQ5JZPmcXpIiLnLPBGS1TRPTOZ/gVr6hOZC353XR6csDWX 3QhgdcOG/W4DwD1m7ZjMvUOeJt6SQk0DoKj/ww/r9baDvhcrxBgO7+nUTK1riEQ8wPyf UPxzEeIxvcEVgbCBIc/BJLw5TSrID0op1iGe0m48YtdBvtOgGl4fuvCZVhYAeqEIssek j2fuNz10YIHgk04ARudIsXB+5F9+cyCSEhjcqNbju0suqi1PcYG8hhWozAEM7aBG5ZSZ GCX+8v3nvCD1iH6xtgUTnYdgRGN0ZFjPJ0TJCvHLVbmggruHZVJWvLedOD1fYXd7JVSo dyfw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774583933; x=1775188733; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=apoxNiA/eaa+k+7fdSb7UYtJMKTzXzOuPZi9n0D6RkU=; b=cEUZlzrqyMYFJECsZNteXiuMwfFdLfOb+9nIxwwiVTh2yVp0zMzJMklk2fxtco04+r pJJGd/UQhjUof7C96nZHtL5hRQPremeaOH+XxfaVFVw3xQ+NxSw9grKne2JEFopr0waJ cRllw+AxkEx574ItWeIS+MLLRoRJLC+lP5efLeKIm3XDfhG6wYDaovPc3msBQhQOfkB5 sJJlYDrt4P7+GJmwXclurfUl0IMZEjyGPC9APEsldG4JjJPJk/EfnDNBkrXNrO0a+/RH 8u8POuVe2Cr4u8MFbItZXbe0aoXF3UwXBC74CLrIo4ocavo34i0VBI2dLdVb/KRya2ex AlXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774583933; x=1775188733; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=apoxNiA/eaa+k+7fdSb7UYtJMKTzXzOuPZi9n0D6RkU=; b=GVHvphSacbyIOF7Icny/pWxheKUHjyFAouMMpx4IcSjvEdPkGCPhikWQg3Eu6l+yi3 194CRHGbbTeKA1AyAEdiy+9BvMI4M1pVUzVtdlNWCyavGS+HwcvWS8B4CG+Uxsafjc+g DdPS+lbBi1BiSBjxmrKACCX7hGrIXrBWZeDwBLh87tWNPqhYkFUsn80wDmW1dlWnm78a tDCLBopId/qKOibOZlyjfnhghdti/rWM02duu78gJ+RF+RofEAwRCkfrt04suVjIj3VN EVJA9AIoOw73tBH+esyvUSDfNh+cQY80UQjycHELeFtGJyJpswdv+agTkKbuWRFxcFwu 3Dfg== X-Forwarded-Encrypted: i=1; AJvYcCXn7zdIHrqQoD8wPWJD4ijcUiK+Zo37kWaOUssMcyb3eyeBcGO3x/05Fsy4FHBeARW37Wl1vsoH2aUbDp3C@lists.postgresql.org X-Gm-Message-State: AOJu0YyWQ1YdM/KAKbn96IT+BulbCd7132lMleUQCVTehNIv8a00xzdB VXhcDC+jXRjZs0Hw4s0rH36donDMbefE+oWGRWwfM6PFq0QAnpcslr5XNVPB/teJPyRfqkmQzvi L3mM5PvBpDq8I5mezIs6IqcA+exf27EE= X-Gm-Gg: ATEYQzzx3+nKXKoZ95ArWa3l0eDufzW0YCWr7dfwxw/569oaOqyjyhTOtLE5Vwpg10G 2BX8zqyWEwaq/GdprWutxIsu1kGPIdp8htJ+Ncl2p3Ueo7Z4cuO3IzDwjgTujCmAivKeN8rpOLV v6I88wMQp8UTQaxT1/KKcUtFvE3sDA/wZm2w3IpCmH/7p37WUPGjnLq0beGYwV/9HZaPfyqcFVs ELad2L8xjyg/ZuuFLMZERdblnifyu6HToSI0DjCauZSnK3ndKPcCYdt+RXwuQ1fyUHaucQBPqiv Z72Sy+jKXr713/pm8ziTwQxWhdSSbfB6xlJcX061FheAKuAUOCPlSw== X-Received: by 2002:a17:902:ce12:b0:2b0:60db:7927 with SMTP id d9443c01a7336-2b0cdcdbb73mr12143805ad.28.1774583932963; Thu, 26 Mar 2026 20:58:52 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: shveta malik Date: Fri, 27 Mar 2026 09:28:41 +0530 X-Gm-Features: AQROBzCV_UrrVOp68vwtH0YNf-AQvWAC6fKz6_xla8Z3CXhfVqEMbFTvljVK7vw Message-ID: Subject: Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? To: Nisha Moond Cc: Amit Kapila , Fujii Masao , PostgreSQL Hackers , shveta malik Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Mar 26, 2026 at 4:08=E2=80=AFPM Nisha Moond wrote: > > On Thu, Mar 26, 2026 at 3:40=E2=80=AFPM Amit Kapila wrote: > > > > On Mon, Mar 23, 2026 at 11:21=E2=80=AFAM Fujii Masao wrote: > > > > > > On Sun, Mar 22, 2026 at 1:52=E2=80=AFAM Amit Kapila wrote: > > > > > > > > On Wed, Mar 18, 2026 at 9:35=E2=80=AFPM Fujii Masao wrote: > > > > > > > > > > I noticed that during standby promotion the startup process sends= SIGUSR1 to > > > > > the slotsync worker to make it exit. Is there a reason for using = SIGUSR1? > > > > > > > > > > > > > IIRC, this same signal is used for both the backend executing > > > > pg_sync_replication_slots() and slotsync worker. We want the worker= to > > > > exit and error_out backend. Using SIGTERM for backend could result = in > > > > its exit. > > > > > > Why do we want the backend running pg_sync_replication_slots() to thr= ow > > > an error here, rather than just exit? If emitting an error is really = required, > > > another option would be to store the process type in SlotSyncCtx and = send > > > different signals accordingly, for example, SIGTERM for the slotsync = worker > > > and another signal for a backend. But it seems simpler and sufficient= to have > > > the backend exit in this case as well. > > > > > > > As we want to retain the existing behavior for API, so instead of > > using two signals, we can achieve what you intend to achieve by one > > signal (SIGUSR1) only. We can use SendProcSignal mechanism as is used > > ParallelWorkerShutdown. On promotion, we send a SIGUSR1 signal to > > slotsync worker/backend via SendProcSignal. Then in > > procsignal_sigusr1_handler(), it will call HandleSlotSyncInterrupt. > > HandleSlotSyncInterrupt() will set the InterruptPending and > > SlotSyncPending flag. Then ProcessInterrupt() will call a slotsync > > specific function based on the flag and do what we currently do in > > ProcessSlotSyncInterrupts. I think this should address the issue you > > are worried about. > > > > +1 > Retaining the current behavior for the API backend keeps it consistent > with other backends that continue after promotion. > > In the reproduced case, the worker (or API backend) is waiting in: > libpqsrv_get_result -> WaitLatchOrSocket -> WaitEventSetWait. > When SIGUSR1 is received, it only sets the latch but does not mark any > interrupt as pending. As a result, CHECK_FOR_INTERRUPTS() is > effectively a no-op, and the process goes back to waiting. So, control > never returns to the slotsync code path, and we cannot rely on > stopSignaled to handle exit/error separately. > Only SIGTERM works here because its handler sets > INTERRUPTS_PENDING_CONDITION, allowing ProcessInterrupts() to run and > break the loop. The other signals like SIGUSR1 or SIGINT do not do > this, so simply using another signal might not solve the API error > handling case. > > I=E2=80=99ve implemented the above approach suggested by Amit in the atta= ched > patch and verified it for both worker and API scenarios. With this, > the API can now error-out without exiting the backend. > +1 on the idea. Few comments: 1) It was not clear initially as to why SetLatch is not done in HandleSlotSyncShutdownInterrupt(), digging it further revealed that procsignal_sigusr1_handler() will do SetLatch outside. Perhaps you can add below comment at the end of HandleSlotSyncShutdownInterrupt() similar to how other functions (HandleProcSignalBarrierInterrupt, HandleRecoveryConflictInterrupt etc) do. /* latch will be set by procsignal_sigusr1_handler */ 2) In ProcessSlotSyncInterrupts(), now we don't need the below logic right? if (SlotSyncCtx->stopSignaled) { if (AmLogicalSlotSyncWorkerProcess()) { ... proc_exit(0); } else { /* * For the backend executing SQL function * pg_sync_replication_slots(). */ ereport(ERROR, errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("replication slot synchronization will stop because promotion is triggered")); } } thanks Shveta