Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPjlG-000zVM-23 for pgsql-hackers@arkaria.postgresql.org; Wed, 20 May 2026 16:27:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wPjlD-0085KA-0j for pgsql-hackers@arkaria.postgresql.org; Wed, 20 May 2026 16:27:08 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPjlC-0085K1-33 for pgsql-hackers@lists.postgresql.org; Wed, 20 May 2026 16:27:07 +0000 Received: from mail-oo1-xc2c.google.com ([2607:f8b0:4864:20::c2c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wPjlB-00000000Vpu-0mrU for pgsql-hackers@lists.postgresql.org; Wed, 20 May 2026 16:27:06 +0000 Received: by mail-oo1-xc2c.google.com with SMTP id 006d021491bc7-6949192b840so1357659eaf.3 for ; Wed, 20 May 2026 09:27:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779294425; cv=none; d=google.com; s=arc-20240605; b=BqPyJg4QyqlurGHc2+oriQI29OADK+d7NGSCGJs4pGOHrkTi9TKzAZETDv6dpK/0AQ ouIV7o8HDlQdeawRui0ck4Vyf23OESpqwhUdGLbWuCCcqQuoXCbOnwDlphznuUO7JUgt oGDFLkQOv60cI8byAQLQuFsOb4Z0i9/oAlvvi/W2chli3CWwRgoL2i0YZeCb+ayn0NzZ tMXRqrsnpQHEFfUSjjeOS4bNfCnS7z5nsEkCkEiLd5R9sZC4k+b0GULW9DdX0l+e+4s8 eqi/Uq3S8u4hgdZAAkRD35ZOzMTNpbdCvfmdxqlGYJubZJzziH95Oxh2N3hTiEeYtoVv V94w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=k0COW+pJAfoWjnskGD496f20X7Qzgsg2GynVnE893xE=; fh=TWOW6pcG+2B//k4gBj/tL1xzyXmOsQD52mW4JZgqdwY=; b=AuVFgQe8V9FLlMzhAtoZSM6bNLtxtQU/IEtrwGVKG+1jYdEZ40tkZWSra4EOmfzhfh Bu0weNTKmj5PH4x8K36T6bA+Plr7kE9X4yFZW3KvSeURfV+/8FNV5MihOxic5lCN50uH FoM0RJkVGRangUjYx2wAAFfQhZ8WdaGd4lEHG+PvWjZHN1nIeD9RkpfTE4pOBQskUXZo UMGhXCU4P6B1Trn2m2L2FO1n/n9DxIig5lzfr6d23C+GnSdhZlBeKrNbmKeO/YZPJmzj 9dfBt1bfHp4CwV9UJ3t4/v70jXFP2U5t332bERICfIvqN/neHv1SbGgDNYN4OCeOrish 6J6Q==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779294425; x=1779899225; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=k0COW+pJAfoWjnskGD496f20X7Qzgsg2GynVnE893xE=; b=aHbikigB//J5T63MGKiDfTkDFiT0tTpf/brD3v23yK8S5+O0G0oNKyarx9c135FMAf V3YzZkYQ/RpasYInT6Ni42/035xtacWlatK1Ns4Lo4pUS1sSTn7irWS666gpWagaSSLD OMjqifq/C40bIjMaFbbyGz9H//t5ShoJq3XgvITX44w3aosyPYPsJl9T8xzruuTWbLtk giM6CK2mAguCFCtYQ6zI54mseOhDSTUNrZMqEapORpFw6R0MtU9LR+XjiO5fP+uxTSE1 exfJkTqrqPRMuPNnZUSHDqg6XKeGGODjmS9OnB5DaIYVEMe5hTHiLXptm+uaAIBVopkt FoPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779294425; x=1779899225; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=k0COW+pJAfoWjnskGD496f20X7Qzgsg2GynVnE893xE=; b=iKrtCchgsLL2JdPDehqDRiFADsko9MhLmB/m/QR+b/zptUCYbcBomJLnSOyalUS+zG DHt4koxsES/3sM0QubJcRJpFHE63ROLI5v9GuNBHi2czeSAwd5WaPU7I8IG5cBczzmIQ 9e6gdA5SI5DzF+M3qKr/np8uESvqtSoiQ8Yvg+vHh74+uHiECbYNBGwhwYyI/tYYOrkQ 5qOEhK3+QXqqXBdAJPAFsz8JAFmp0bOksm4dfahKxdj1EUE9SPwiqvc/ZSpz/Fj7sPXz 6wCTnHEOWkkWZy7lMtslPouXsaOhsJTIWBMORqne1zCNl00HeQimUOT73QXwNcuxO2Cf 8jWA== X-Gm-Message-State: AOJu0Yzov7zrpM/Kx0XiqqEmjfi4dGo8cd6Q40vVpYiC1tN3RUE+3qRW xtEUf8ZvXKVay6rTH9njh7KCNdiwTIlpH7x5/aNRLVk6+WXTYKWE6agLwJcOYWjHa+CiEwJioG1 g+GjiC8Ve4aDpOkloSzhZBNIoZFNOx+OGe2HPab0= X-Gm-Gg: Acq92OGztf1GCNB7eNEArpMb+ztK0vZsszA5ekWASydRi0DmPpOVlce80GWHAThFoUM ZqkoFiPGnOwDdw/6oJDw7+Bh50wWmIJHa5qZuSH/6YcNXo/hKrlod7jWIfji6+W3zUebqGfvBca Ql0ZSVFTfwFM8cnodPhaN3WVxgOvt9/wkViIxLThWtaLWu0o/9v0lIZtN67N6tmRKTde2HVToQo eNkEDQG2BPK6CKvN3o6453po8i9P+acj5gmbuGZkYHC/376ybxOE5GCq9NAdwlSrfRnB7ll7y8v yrh5HWHpZ7eEOgSo5EkCAiRf5PqTEWn733DBbQOniQ== X-Received: by 2002:a05:6820:2017:b0:69b:196a:de67 with SMTP id 006d021491bc7-69c93fa11a1mr15270124eaf.0.1779294425143; Wed, 20 May 2026 09:27:05 -0700 (PDT) MIME-Version: 1.0 References: <44c24dcf-5710-410f-b1b6-d10b315f3d51@postgrespro.ru> In-Reply-To: From: Fujii Masao Date: Thu, 21 May 2026 01:26:53 +0900 X-Gm-Features: AVHnY4IOx_RS-2k27cxFKO_4-_KrPMX3mfoHAZXUuVtkwMa2nTm0R3uN6k-9UoU Message-ID: Subject: Re: Deadlock detector fails to activate on a hot standby replica To: Vitaly Davydov Cc: pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Jan 23, 2026 at 8:52=E2=80=AFPM Vitaly Davydov wrote: > > Dear Hackers, > > I would like to propose a patch that fixes the problem, which has the roo= ts in > the possibility of spontaneous SIGALRM signals when waiting for some time= outs. > The idea of the patch - ignore spontaneous SIGALRM signals and continue w= aiting > for expected timeouts or buffer unpinning by the conflicting backend. Thi= s > patch is not a final version. I plan to add a tap-test for this case. Thanks for the patch! #include "storage/sinvaladt.h" #include "storage/standby.h" +#include "storage/buf_internals.h" This include should be placed in alphabetical order. + * We assume that only UnpinBuffer() and the timeout requests established + * above can wake us up here. WakeupRecovery() called by walreceiver or + * SIGHUP signal handler, etc cannot do that because it uses the different + * latch from that ProcWaitForSignal() waits on. As your investigation showed, that assumption does not seem to hold. If so, I think something like the following would be more accurate: --------------------------------- ProcWaitForSignal() can also wake up for unrelated reasons, so recheck whether we're still the waiter after each wakeup. If we are and no time= out fired, continue waiting without resetting the active timeouts. --------------------------------- + uint32 buf_refcount =3D BUF_STATE_GET_REFCOUNT(buf_state); + if (buf_refcount > 1) + continue; Wouldn't it be better to check explicitly whether we're still the waiter, instead of using BUF_STATE_GET_REFCOUNT()? (buf_state & BM_PIN_COUNT_WAITER) !=3D 0 && bufHdr->wait_backend_pgprocno =3D=3D MyProcNumber The current control flow in the loop feels a bit hard to follow. Would something like the following be simpler? for (;;) { .... ProcWaitForSignal(...); if (!StillWaitingForBufferPin(...)) break; if (got_standby_delay_timeout) { SendRecoveryConflictWithBufferPin(...); break; } else if (got_standby_deadlock_timeout) { SendRecoveryConflictWithBufferPin(...); break; } } Regards, --=20 Fujii Masao