public inbox for [email protected]  
help / color / mirror / Atom feed
From: Fujii Masao <[email protected]>
To: Vitaly Davydov <[email protected]>
Cc: [email protected]
Subject: Re: Deadlock detector fails to activate on a hot standby replica
Date: Thu, 21 May 2026 01:26:53 +0900
Message-ID: <CAHGQGwExVv1-mtPEX-JFGBXNY2r8DfP+sQVPD8GpDTefW0KdsQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

On Fri, Jan 23, 2026 at 8:52 PM Vitaly Davydov <[email protected]> wrote:
>
> Dear Hackers,
>
> I would like to propose a patch that fixes the problem, which has the roots in
> the possibility of spontaneous SIGALRM signals when waiting for some timeouts.
> The idea of the patch - ignore spontaneous SIGALRM signals and continue waiting
> for expected timeouts or buffer unpinning by the conflicting backend. This
> patch is not a final version. I plan to add a tap-test for this case.

Thanks for the patch!


 #include "storage/sinvaladt.h"
 #include "storage/standby.h"
+#include "storage/buf_internals.h"

This include should be placed in alphabetical order.


+ * We assume that only UnpinBuffer() and the timeout requests established
+ * above can wake us up here. WakeupRecovery() called by walreceiver or
+ * SIGHUP signal handler, etc cannot do that because it uses the different
+ * latch from that ProcWaitForSignal() waits on.

As your investigation showed, that assumption does not seem to hold. If so,
I think something like the following would be more accurate:

    ---------------------------------
    ProcWaitForSignal() can also wake up for unrelated reasons, so recheck
    whether we're still the waiter after each wakeup. If we are and no timeout
    fired, continue waiting without resetting the active timeouts.
    ---------------------------------


+ uint32 buf_refcount = BUF_STATE_GET_REFCOUNT(buf_state);
<snip>
+ if (buf_refcount > 1)
+ continue;

Wouldn't it be better to check explicitly whether we're still the waiter,
instead of using BUF_STATE_GET_REFCOUNT()?

    (buf_state & BM_PIN_COUNT_WAITER) != 0 &&
    bufHdr->wait_backend_pgprocno == MyProcNumber


The current control flow in the loop feels a bit hard to follow.
Would something like the following be simpler?

    for (;;)
    {
        ....
        ProcWaitForSignal(...);

        if (!StillWaitingForBufferPin(...))
            break;

        if (got_standby_delay_timeout)
        {
            SendRecoveryConflictWithBufferPin(...);
            break;
        }
        else if (got_standby_deadlock_timeout)
        {
            SendRecoveryConflictWithBufferPin(...);
            break;
        }
    }

Regards,

-- 
Fujii Masao






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Deadlock detector fails to activate on a hot standby replica
  In-Reply-To: <CAHGQGwExVv1-mtPEX-JFGBXNY2r8DfP+sQVPD8GpDTefW0KdsQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox