Right. The postmaster blocks all signals before starting child process
as the following comment explains:
/*
* We start postmaster children with signals blocked. This allows them to
* install their own handlers before unblocking, to avoid races where they
* might run the postmaster's handler and miss an important control
* signal. With more analysis this could potentially be relaxed.
*/
sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
Investigating the issue, I found there is a race condition between the
procsignal initialization and emitting signal barrier that could be
the cause of this issue. Imagine the following scenario:
1. In ProcSignalInit(), the checkpointer initializes its
slot->pss_barrierGeneration with the global generation.
2. In EmitProcSignalBarrier(), the startup checks the checkpointer's
procsignal slot but it skips emitting the signal as slot->pss_pid is
still 0. It can happen even though the checkpointer holds a spinlock
on its slot during the initialization because the first pid check is
done without a spinlock acquisition.
3. The checkpointer sets its pid to slot->pss_pid and releases the spin lock.
4. In WaitForProcSignalBarrier(), the startup checks the
checkpointer's procsignal slot that has already initialized the
pss_barrierGeneration, and waits for it to be updated. However, the
checkpointer never updates its barrier generation as it doesn't get
the signal.