On Wed, Apr 29, 2026 at 11:00 AM Alexander Lakhin <[email protected]> wrote:
I was wondering why is that failure the only one of this kind on buildfarm
(in last two years, at least), so I've tried to reproduce it on
REL_18_STABLE... and failed.
Then I've bisected it on the master branch and found (your) commit that
introduced this behavior: 67c20979c from 2025-12-23.
I've confirmed that this race condition issue is present from v15 to
the master. In v14, we have the procsignal barrier code but don't use
it anywhere. In v18 or older, it could happen when executing DROP
DATABASE, DROP TABLESPACE etc, whereas in the master, it could happen
in more cases as we're using procsignal barrier more places. In any
case, if a process emits a signal barrier when another process is
between the initialization of slot->pss_barrierGeneration and
slot->pss_pid initialization, the subsequent
WaitForProcSignalBarrier() ends up waiting for that process forever.
So I think the patch should be backpatched to v15. Please review these
patches.