Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wHo6u-007Wpk-2m for pgsql-hackers@arkaria.postgresql.org; Tue, 28 Apr 2026 19:28:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wHo6s-000qGc-1Q for pgsql-hackers@arkaria.postgresql.org; Tue, 28 Apr 2026 19:28:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wHo6s-000qGU-0M for pgsql-hackers@lists.postgresql.org; Tue, 28 Apr 2026 19:28:42 +0000 Received: from mail-pj1-x1030.google.com ([2607:f8b0:4864:20::1030]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wHo6p-00000003ZkZ-2wTH for pgsql-hackers@lists.postgresql.org; Tue, 28 Apr 2026 19:28:41 +0000 Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-35d9f68d011so7902202a91.2 for ; Tue, 28 Apr 2026 12:28:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1777404517; cv=none; d=google.com; s=arc-20240605; b=fOd7Gr8EHB0YF1916cUl4wXlDkNYGTacbOlh24FEfdhkn9H/ceLThvsNYiqWsvLH4L eKAM51vu05I56bAx3zcc2BRRV2O0zhj+ENkxk6VTcS6+qwdmszvFHmItpW9aZrRbL7ff 5NlPStdmEd8rMx6aPBtQZQ+/ewNQOqDB5y9/8qh7mX97jrALoxjFl0ifu/0S984Os/jm pd/wEID4GqxhjtA0VbsvQ3uEDR4u88Qg4jcktOVbfnANE/teEm5HG0jKymlR65lcDDRm NJqkB1VTK/WcyhXZ/Q2SPCO2O6tVmAsVMEkfNDIzmqvCYlyJr9M+7Z6A3X7yTxWYXpVd zLqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=IxvNGQzfUJyFWAYTN2hyNgAq98grCfQZ8UQZUtKz8Zw=; fh=ucAUVnp510h1JRLz7duLy2zHr1rG5S2LearLXwC+gRQ=; b=l3BRnyjWQprQ05TCru1wyLja96uXEjA1geRKRnV23rPQjOvcfscvvWCX8DBPgr+mPN htlfQBasnExxHGXxr/QWFqYwVvrM9NJS0wp2pde0nAwJdPgy/93WF+gY8sWelK0T5rPP gX0QqZSrD5WjwzX6KpXOAtrug+Ha6dDKKLEFFDHzBVOP7kP7PgyfeF8c8XRRCsj3CU5M fVeYMdw0jzpGqyq1CRitIRvYUKwRCY8VygvySYO6SRkCs+/sk9kVTuuejjZbwBYFj3YR 67IKgCJpwF5SXbk2hDa7tzhhAyN0a48xzeC38VHre1W/V6fu4pXvvr14iiRNms6lMt3j z6YQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777404517; x=1778009317; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=IxvNGQzfUJyFWAYTN2hyNgAq98grCfQZ8UQZUtKz8Zw=; b=cuvqjOscV/wI8/Qc+Jdz6pZMcrkVWJyu/6hnomJJ2q4Dt9sFE9qoLKNoJkuvqL3w4D WDwOhu62hyDx/XICVuuk+0ASQDq0TwliFTH/kXwURv0OX7cKNjIqd23uqFrTY4Mur7aX uBbzEUb6221NZ7rMR0beu8hj/uaf2DRRNymbeB4KtaPZ6Dm1mvtJkpM32F+7BMawzULZ +8zILH+ebtCIBaJPbgAPtOnodTcl7v2b3+Lu904ajQ3oKpdZDheJ7+yBMzI09LKhBbEa S2tYdVRPJ5wbXwAUwsSBcdjhdG7KNg0Qy+an9gOTsqCzOrEXpZ43oC3QyNGMpB0R0jkN hTyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777404517; x=1778009317; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IxvNGQzfUJyFWAYTN2hyNgAq98grCfQZ8UQZUtKz8Zw=; b=bg6mPw3BPIchZn83/mr9tm6ST/edynpx24SCJGKnLZc7lraLdWAfUWJCeMhIXV+0Vl r51WFEYTW1//idAuxAAGfT2WEHP7cne/7H+4ABdxcCIL+4dgnnxCRfuLnissNTOwNxbG 7Speu6vKSUeRc1Rg3D3qbZalK4esg/vfbvjgTn2wFX+vh+D2BNY9WiigydHYf+Lo0Bz/ Lq3YM6wGlXZoszssX1+hIaD6nFEbiRZyJmvQ03tsl4mGvVUFurY0YwedsZHB8B8LCACP dBqVlBz2KYDxWu4/2eEdJG5D36At0uY1eKp5tJCJbGH93hISjgXx2QIV1TzjjEaGWR3V +W2g== X-Forwarded-Encrypted: i=1; AFNElJ9qHaK7YxZUfVT2L5LNNHnR3gGjbCWBFdNdgUxDAtWAECHo84xav/YIGLM1LNJ1yh7mzKNNYMsPrP2Uh/Vu@lists.postgresql.org X-Gm-Message-State: AOJu0YxwAIgfbpTpyhLCuLLzHOBgmie6JGdgRTKj+h5fHN8v3vebT5Ur N4FVRgbS5jSJ0PP4nZZuJL7B7716mSiaVX9akuFDs+AbUiKgmNYt/QTaGWmnZzbYiHsOAhJAuOE i/WpBR+spWHA97Ckgo2B7CHlaOx1GEKQ= X-Gm-Gg: AeBDievfS+QOUZuBwQnzqVbCaS6QzSOIEvYOmuLjx/bLZYhLEO7sbNM274EdaYiVLhO EzrOlbEOh0uQkxuEC1SAMaKG3WB1iYWhj0DwIciFBWAZikaubsEZc/LQ+md5aEV2ucx088Z6+h3 AY48vPFsXib/zpQucKTJA/UDoH9iMpoDmMQmSziiWzwvoDVrgYaCLCKCs56pWA1d7Sk0EEf+LDg FzWky+iHjqlJHLQYCuqpGowAL4OOQkA3tr9ZjPiQ7uGcp49FIaucVf8kEfyTjBIQKYMYUghwbPz Eep+nQoDlEsigk3duKyh3KE3osr4DasNEbsL7eiBQgg84SlZ5aY= X-Received: by 2002:a17:90b:3bcd:b0:35f:b7c1:faad with SMTP id 98e67ed59e1d1-36492048cddmr4241673a91.25.1777404516996; Tue, 28 Apr 2026 12:28:36 -0700 (PDT) MIME-Version: 1.0 References: <4358bd85-f6b4-4da6-9909-74428fe3c8f7@gmail.com> In-Reply-To: <4358bd85-f6b4-4da6-9909-74428fe3c8f7@gmail.com> From: Masahiko Sawada Date: Tue, 28 Apr 2026 12:27:59 -0700 X-Gm-Features: AVHnY4Jpp_xLgFLCCkGzvvK296T2y5pE7aOkw29obR2Ko5nK7pJje9XwYrfKnTU Message-ID: Subject: Re: Startup process deadlock: WaitForProcSignalBarriers vs aux process To: Alexander Lakhin Cc: Andres Freund , Matthias van de Meent , Thomas Munro , PostgreSQL Hackers , Heikki Linnakangas Content-Type: multipart/mixed; boundary="000000000000dc803506508a3cd0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000dc803506508a3cd0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 27, 2026 at 11:00=E2=80=AFAM Alexander Lakhin wrote: > > Hello Sawada-san, > > 24.04.2026 20:52, Masahiko Sawada wrote: > > Right. The postmaster blocks all signals before starting child process > as the following comment explains: > > /* > * We start postmaster children with signals blocked. This allows t= hem to > * install their own handlers before unblocking, to avoid races wher= e they > * might run the postmaster's handler and miss an important control > * signal. With more analysis this could potentially be relaxed. > */ > sigprocmask(SIG_SETMASK, &BlockSig, &save_mask); > > Investigating the issue, I found there is a race condition between the > procsignal initialization and emitting signal barrier that could be > the cause of this issue. Imagine the following scenario: > > 1. In ProcSignalInit(), the checkpointer initializes its > slot->pss_barrierGeneration with the global generation. > 2. In EmitProcSignalBarrier(), the startup checks the checkpointer's > procsignal slot but it skips emitting the signal as slot->pss_pid is > still 0. It can happen even though the checkpointer holds a spinlock > on its slot during the initialization because the first pid check is > done without a spinlock acquisition. > 3. The checkpointer sets its pid to slot->pss_pid and releases the spin l= ock. > 4. In WaitForProcSignalBarrier(), the startup checks the > checkpointer's procsignal slot that has already initialized the > pss_barrierGeneration, and waits for it to be updated. However, the > checkpointer never updates its barrier generation as it doesn't get > the signal. > > > Thank you for the investigation and explanation of the issue! > > I've been puzzled by a buildfarm failure [1] with such symptoms for a whi= le > and even reproduced it locally once, but couldn't gather more information > that time. But now that you have described the scenario, I can easily > reproduce the same test failure with: > --- a/src/backend/storage/ipc/procsignal.c > +++ b/src/backend/storage/ipc/procsignal.c > @@ -206,6 +206,7 @@ ProcSignalInit(const uint8 *cancel_key, int cancel_ke= y_len) > if (cancel_key_len > 0) > memcpy(slot->pss_cancel_key, cancel_key, cancel_key_len); > slot->pss_cancel_key_len =3D cancel_key_len; > +pg_usleep(10000); > pg_atomic_write_u32(&slot->pss_pid, MyProcPid); Thank you for testing this. I've attached a patch to address the issue. I haven't verified it across all versions yet, but I suspect it exists in the stable branches as well. Previously, the issue rarely occurred because EmitProcSignalBarrier() was only used for smgr invalidation. However, now that we use signal barriers for online wal_level changes and checksum status updates, this race condition is likely to be encountered more frequently. Regards, --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com --000000000000dc803506508a3cd0 Content-Type: text/x-patch; charset="US-ASCII"; name="v1-0001-Fix-race-between-ProcSignalInit-and-EmitProcSigna.patch" Content-Disposition: attachment; filename="v1-0001-Fix-race-between-ProcSignalInit-and-EmitProcSigna.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_moj0psmi0 RnJvbSA4ZWQ3MmUxYmM3NDhmOTlmYmY4YjEwM2FlNWJkNGNmMzk1Y2I1NGVmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBNYXNhaGlrbyBTYXdhZGEgPHNhd2FkYS5tc2hrQGdtYWlsLmNv bT4KRGF0ZTogVHVlLCAyOCBBcHIgMjAyNiAxMjoyMToyMSAtMDcwMApTdWJqZWN0OiBbUEFUQ0gg djFdIEZpeCByYWNlIGJldHdlZW4gUHJvY1NpZ25hbEluaXQoKSBhbmQKIEVtaXRQcm9jU2lnbmFs QmFycmllcigpLgoKUHJvY1NpZ25hbEluaXQoKSByZWFkIHRoZSBnbG9iYWwgYmFycmllciBnZW5l cmF0aW9uIGJlZm9yZSBwdWJsaXNoaW5nCml0cyBQSUQgaW50byBwc3NfcGlkLiBBIGNvbmN1cnJl bnQgRW1pdFByb2NTaWduYWxCYXJyaWVyKCkgaXRlcmF0ZXMKdGhlIFByb2NTaWduYWwgc2xvdHMg YW5kIHNraXBzIGFueSB3aG9zZSBwc3NfcGlkIGlzIHN0aWxsIHplcm8sIG9uIHRoZQphc3N1bXB0 aW9uIHRoYXQgc3VjaCBhIHNsb3Qgd2lsbCBwaWNrIHVwIHRoZSBuZXcgZ2VuZXJhdGlvbiB3aGVu IGl0CmxhdGVyIHJlYWRzIHBzaF9iYXJyaWVyR2VuZXJhdGlvbi4gQnV0IGJlY2F1c2UgdGhlIGpv aW5pbmcgYmFja2VuZCBoYWQKYWxyZWFkeSByZWFkIHRoZSAob2xkZXIpIGdsb2JhbCBnZW5lcmF0 aW9uIHVuZGVyIGl0cyBzbG90J3Mgc3BpbmxvY2ssCml0IHdvdWxkIHN0b3JlIGEgc3RhbGUgdmFs dWUgaW50byBwc3NfYmFycmllckdlbmVyYXRpb24gYW5kIG5ldmVyCmFic29yYiB0aGUganVzdC1l bWl0dGVkIGJhcnJpZXIsIHJlc3VsdGluZyB0aGF0CldhaXRGb3JQcm9jU2lnbmFsQmFycmllcigp IGRpZG4ndCBjb21wbGV0ZS4KClB1Ymxpc2ggcHNzX3BpZCBiZWZvcmUgcmVhZGluZyBwc2hfYmFy cmllckdlbmVyYXRpb24sIHdpdGggYSBtZW1vcnkKYmFycmllciBpbiBiZXR3ZWVuIHNvIHRoYXQg dGhlIHN0b3JlIGlzIGdsb2JhbGx5IHZpc2libGUgZmlyc3QuCUEKY29uY3VycmVudCBFbWl0UHJv Y1NpZ25hbEJhcnJpZXIoKSB0aGVuIGVpdGhlciBvYnNlcnZlcyB0aGUgcHVibGlzaGVkClBJRCBh bmQgc2lnbmFscyB0aGlzIHNsb3QsIG9yIGNvbXBsZXRlcyBpdHMgZ2VuZXJhdGlvbiBpbmNyZW1l bnQKYmVmb3JlIHdlIGxvYWQgaXQuCgpEaXNjdXNzaW9uOiBodHRwczovL3Bvc3Rnci5lcy9tL0NB RXplMldnQUptV1JlRE43Q2h0YmE4RXIyWUJ2S0NvYTBLVk4yNS0xZXZuVHJIc0x5QUBtYWlsLmdt YWlsLmNvbQpCYWNrcGF0Y2gtdGhyb3VnaDoKLS0tCiBzcmMvYmFja2VuZC9zdG9yYWdlL2lwYy9w cm9jc2lnbmFsLmMgfCAxMSArKysrKysrKysrLQogMSBmaWxlIGNoYW5nZWQsIDEwIGluc2VydGlv bnMoKyksIDEgZGVsZXRpb24oLSkKCmRpZmYgLS1naXQgYS9zcmMvYmFja2VuZC9zdG9yYWdlL2lw Yy9wcm9jc2lnbmFsLmMgYi9zcmMvYmFja2VuZC9zdG9yYWdlL2lwYy9wcm9jc2lnbmFsLmMKaW5k ZXggMjY0ZTRjMjJjYTYuLmIwNjgxY2EwYWUyIDEwMDY0NAotLS0gYS9zcmMvYmFja2VuZC9zdG9y YWdlL2lwYy9wcm9jc2lnbmFsLmMKKysrIGIvc3JjL2JhY2tlbmQvc3RvcmFnZS9pcGMvcHJvY3Np Z25hbC5jCkBAIC0xODgsNiArMTg4LDE2IEBAIFByb2NTaWduYWxJbml0KGNvbnN0IHVpbnQ4ICpj YW5jZWxfa2V5LCBpbnQgY2FuY2VsX2tleV9sZW4pCiAJLyogQ2xlYXIgb3V0IGFueSBsZWZ0b3Zl ciBzaWduYWwgcmVhc29ucyAqLwogCU1lbVNldChzbG90LT5wc3Nfc2lnbmFsRmxhZ3MsIDAsIE5V TV9QUk9DU0lHTkFMUyAqIHNpemVvZihzaWdfYXRvbWljX3QpKTsKIAorCS8qCisJICogUHVibGlz aCB0aGUgUElEIGJlZm9yZSByZWFkaW5nIHRoZSBnbG9iYWwgYmFycmllciBnZW5lcmF0aW9uIHRv IGVuc3VyZQorCSAqIHRoYXQgRW1pdFByb2NTaWduYWxCYXJyaWVyKCkgZG9lc24ndCBza2lwIHVz IHdoaWxlIHdlIGFyZSBncmFiYmluZyBhbgorCSAqIG9sZGVyIGdlbmVyYXRpb24uIFdlIG5lZWQg YSBtZW1vcnkgYmFycmllciBoZXJlIHRvIG1ha2Ugc3VyZSB0aGF0IHRoZQorCSAqIHVwZGF0ZSBv ZiBwc3NfcGlkIGlzIGdsb2JhbGx5IHZpc2libGUgYmVmb3JlIHRoZSBsb2FkIG9mIHRoZSBnbG9i YWwKKwkgKiBiYXJyaWVyIGdlbmVyYXRpb24gZXhlY3V0ZXMuCisJICovCisJcGdfYXRvbWljX3dy aXRlX3UzMigmc2xvdC0+cHNzX3BpZCwgTXlQcm9jUGlkKTsKKwlwZ19tZW1vcnlfYmFycmllcigp OworCiAJLyoKIAkgKiBJbml0aWFsaXplIGJhcnJpZXIgc3RhdGUuIFNpbmNlIHdlJ3JlIGEgYnJh bmQtbmV3IHByb2Nlc3MsIHRoZXJlCiAJICogc2hvdWxkbid0IGJlIGFueSBsZWZ0b3ZlciBiYWNr ZW5kLXByaXZhdGUgc3RhdGUgdGhhdCBuZWVkcyB0byBiZQpAQCAtMjA3LDcgKzIxNyw2IEBAIFBy b2NTaWduYWxJbml0KGNvbnN0IHVpbnQ4ICpjYW5jZWxfa2V5LCBpbnQgY2FuY2VsX2tleV9sZW4p CiAJaWYgKGNhbmNlbF9rZXlfbGVuID4gMCkKIAkJbWVtY3B5KHNsb3QtPnBzc19jYW5jZWxfa2V5 LCBjYW5jZWxfa2V5LCBjYW5jZWxfa2V5X2xlbik7CiAJc2xvdC0+cHNzX2NhbmNlbF9rZXlfbGVu ID0gY2FuY2VsX2tleV9sZW47Ci0JcGdfYXRvbWljX3dyaXRlX3UzMigmc2xvdC0+cHNzX3BpZCwg TXlQcm9jUGlkKTsKIAogCVNwaW5Mb2NrUmVsZWFzZSgmc2xvdC0+cHNzX211dGV4KTsKIAotLSAK Mi41NC4wCgo= --000000000000dc803506508a3cd0--