Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wEYT4-0048eE-23 for pgsql-hackers@arkaria.postgresql.org; Sun, 19 Apr 2026 20:10:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wEYT2-00G7z1-2K for pgsql-hackers@arkaria.postgresql.org; Sun, 19 Apr 2026 20:10:08 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wEYT2-00G7yt-14 for pgsql-hackers@lists.postgresql.org; Sun, 19 Apr 2026 20:10:08 +0000 Received: from mail-yw1-x1130.google.com ([2607:f8b0:4864:20::1130]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wEYT0-000000021Ij-0AJ5 for pgsql-hackers@postgresql.org; Sun, 19 Apr 2026 20:10:08 +0000 Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-79cd8f8e261so12868447b3.3 for ; Sun, 19 Apr 2026 13:10:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776629404; cv=none; d=google.com; s=arc-20240605; b=ex3T9p53FnPPcd+8LSFSBD1ze9Cu5e+PSxfHrESycot+gahB3c1QMQ6EDKXX0mBvUZ HHvV77elMY58ye8VLUuDuYlyNIhzTFAenH9b25Gv6Fr+p40iUK+D4NIsLcCEgwKFdLyo lEmN6gBvCCdWWCWu0kz/D5mnZphUooGAXfWB6X5attg9E2HCQB70xhhcCKyL7IM+QmPY Xbm3Oa/wATGiwITvhA6a1IvwaM3xsxhzSWA96n3tg7+HgMa2VYxrbrvsK6NJ3C2OPKKN 8xiGwfLPhSp3neCbeBTsHj2zfIjZ+Mf3bDv5tW/InfcP60tE3XNPPOe9NsPAI5yV6J/j mpPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:mime-version:dkim-signature; bh=M/gmpjLHD4bcobRaaAFGw919r4Mu38u84RR/cCpJoSQ=; fh=33OU7BWuulPFH378PdKTnpeW+jw3IP20DTmpLDeQ3pE=; b=J7CSNM8ekxgf4ljODRpIRh6XfB/JEq+LT33+aAUvJhzkALHMiQS1exXSADfc1QTUNI N8XCp9CK5QTjmO5Bks1Kp+X1SBlutNoBskR+Qq8BoK3SPrZzu9nhKlqWJa/BlPVH3e1Z RDa0VFFbl1NL2otLS59B2xsN85s1/3KJsA+83nzKb7H2uOpSRqEqkbxyp5MzTuhlF7Jq HxQDKre1NhwuXAhKq3jb5X++I73g1MxaeevX0pCWeFo/mKVTakND290KFdGFT6HNxbiv mMZ6JCDIpBpNc4qmmTGJfonrq2PhfhGk96TnRQuJLEJJA2YH86gy2N56Hl0UZ8s415VS 1AYA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776629404; x=1777234204; darn=postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=M/gmpjLHD4bcobRaaAFGw919r4Mu38u84RR/cCpJoSQ=; b=P19J+S25jeEqZUXX5/QtBe1WLZr6dt4quBqhW3rK9ZHxfme+fB6+hbf3R5P+Sjsbb5 F4ODVRUz603OT6Dly+ghZBBQc46yxIX2Sj5LTwpuysfohMr8zDGgR/CDvx1/18g5yPxV RNzL3Od3ZjJwX0CviQL5Jkz3y+uiBigSE8SwylLdX0pWWphXUPSTxnSKhpXWm7U6/BK8 3vHzgzpWa7zBor9IQyDklE9GSJa24quEzKvgTIIsnTI+nULxp6AnDfBSruTcUBqzAS3s 38EiGiRbMGDdDAY3/dSTxW/SFKVnNijccgskLlCTR9N51d/Qb1JO9PLCz/WPrUSRoOTv l+IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776629404; x=1777234204; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=M/gmpjLHD4bcobRaaAFGw919r4Mu38u84RR/cCpJoSQ=; b=bfDdyWEGlyEo9dDLFrAerXlttRWNvSuIGQpoiegrre4UgSSKoIl9msuDZdQokFT148 16orpc3uJqN0lOejMPBl76pgU656Rlgy+7anCEUihGqAQ3sFVgHDxwjGH2JHBz70O3fk 7wxXqxiQalbBJZK/WSbsR3//G/64JfSGFCM0aWSJPuZCav8VmHi3VWC8zw/KsO2j/E9s +g6FI1raYR65Smvp14QSIiTJpemnkO7F6XMcmvxsbPwqHkGnqrwo58xfc9pzMNNqAbsq fMPNrTGrAM+d2m55AlZq8nmtyRPsGCCNDnUnjb9mxJoiVzMjR2fBqQ6uw/Py75x732mQ 6ygw== X-Gm-Message-State: AOJu0YxKPrj3u+oqRwoLgGd+2ZTZWWQzaOsnb3X6kh0IBWi2J9IpM0Il FbEZNSzW8OiOdhhuKMOD3jWcvEg9FUqtZjnaM5O+rVv5ZWgFSlMsgogHDRRE3JbP12eraMvZ1LH YWqyNGkjEFr+NYxWoE7fAcNdrh6KvRlHiaCDW X-Gm-Gg: AeBDietOAZ4y6RLkbNv3q15/FfOa9P5cG9u7NnH+JdXfozrPlIiZM3cEuS4Pq33Ce78 uyaOk3ViwzmXdfJMu+IQlFHKBQUk0Fjpgb1ix6ecQGSXmaNT5LIcKcLWVKeLkwlZoBtlMZ2AGg/ Qr2iTkIUXyOLSMH2XKPF2O76zgzZwmwInWGS2zRw7qWTBBuSBSAmYL9wzgxPrcZXCsJ1yCYTQYi BqPHSswfGTNi2zdJRz1sGwLrujPS5ittESVyH69XIA5oFMfBXpR7o1PTtsKYOBZZBGYTCGouqNd yoUML90nqMGwMFetApTsXWuBE7XrggWbSNYZohYEe8YSAHSWCQ== X-Received: by 2002:a05:690c:d85:b0:79a:60f6:c5ed with SMTP id 00721157ae682-7b9eceb5215mr116790037b3.6.1776629403678; Sun, 19 Apr 2026 13:10:03 -0700 (PDT) MIME-Version: 1.0 From: Ayush Tiwari Date: Mon, 20 Apr 2026 01:39:51 +0530 X-Gm-Features: AQROBzB5yP25dy2r3yQXikWwMrJVIjqTtD8NhA75YC9C3avvNq8_PXYZZQturo0 Message-ID: Subject: [BUG] Race in online checksums launcher_exit() To: pgsql-hackers@postgresql.org Content-Type: multipart/mixed; boundary="000000000000812653064fd5c4d0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000812653064fd5c4d0 Content-Type: multipart/alternative; boundary="000000000000812652064fd5c4ce" --000000000000812652064fd5c4ce Content-Type: text/plain; charset="UTF-8" Hi hackers, While using the pg_enable_data_checksums() feature, I found a likely bug, a race condition in datachecksum_state.c's launcher_exit(). When pg_enable_data_checksums() is called twice before the first launcher starts, two bg workers are registered (the code expects this). The redundant launcher exits early, but it's launcher_exit() callback unconditionally clears the shared launcher_running flag and may call SetDataChecksumsOff() -- even though it never owned the flag. This allows a third pg_enable_data_checksums() call to launch another launcher concurrently with the first (duplicate work, doubled I/O, spurious warnings). Worse, if the redundant launcher initialized after the winner transitioned to inprogress-on, its exit handler calls SetDataChecksumsOff(), silently aborting the enable operation. (I have not triggered the SetDataChecksumsOff part though calling out ad it can be a likely scenario based on timing of workers) Reproduced by firing three calls in quick succession: psql -c "SELECT pg_enable_data_checksums();" & psql -c "SELECT pg_enable_data_checksums();" & sleep 0.5 psql -c "SELECT pg_enable_data_checksums();" & Log shows two launchers processing databases concurrently: [2093292] LOG: enabling data checksums requested [2093293] LOG: already running, exiting [2093299] LOG: enabling data checksums requested -- third launcher admitted [2093292] LOG: processing database "postgres" [2093299] LOG: processing database "postgres" -- same DB, concurrently [2093299] WARNING: cannot set data checksums to "on", current state is not "inprogress-on" I think the process-local launcher_running flag exists for this purpose and is already used for the worker-kill block, but the flag-clear and state-revert blocks do not use it. The attached patch returns early from launcher_exit() when the local flag is false. Thoughts? Regards, Ayush --000000000000812652064fd5c4ce Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi hackers,

While using the pg_enable_data_checksum= s() feature, I found a likely bug, a race condition in=C2=A0 datachecksum_s= tate.c's launcher_exit().

When pg_enable_data_checksums() is cal= led twice before the first launcher starts, two bg workers are registered (= the code expects this).=C2=A0 The redundant launcher exits early, but it= 9;s launcher_exit() callback unconditionally clears the shared launcher_run= ning flag and may call SetDataChecksumsOff() -- even though it never owned = the flag.

This allows a third pg_enable_data_checksums() call to la= unch another launcher concurrently with the first (duplicate work, doubled = I/O, spurious warnings).=C2=A0 Worse, if the redundant launcher initialized= after the winner transitioned to inprogress-on, its exit handler calls Set= DataChecksumsOff(), silently aborting the enable operation.=C2=A0 (I have not=C2=A0triggered the SetDataChecksumsOff part though=C2=A0calling= out ad it can be a likely scenario based on timing of workers)

Reproduced by firing three calls in quick succession:

=C2=A0= psql -c "SELECT pg_enable_data_checksums();" &
=C2=A0 psq= l -c "SELECT pg_enable_data_checksums();" &
=C2=A0 sleep 0= .5
=C2=A0 psql -c "SELECT pg_enable_data_checksums();" &
Log shows two launchers processing databases concurrently:

=C2= =A0 [2093292] LOG: =C2=A0enabling data checksums requested
=C2=A0 [20932= 93] LOG: =C2=A0already running, exiting
=C2=A0 [2093299] LOG: =C2=A0enab= ling data checksums requested=C2=A0 =C2=A0 =C2=A0-- third launcher admitted=
=C2=A0 [2093292] LOG: =C2=A0processing database "postgres"=C2=A0 [2093299] LOG: =C2=A0processing database "postgres"=C2=A0= =C2=A0 =C2=A0 =C2=A0 -- same DB, concurrently
=C2=A0 [2093299] WARNING:= =C2=A0cannot set data checksums to "on", current state is not &q= uot;inprogress-on"

I think the process-local launcher_running f= lag exists for this purpose and is already used for the worker-kill block, = but the flag-clear and state-revert blocks do not use it.

The attach= ed patch returns early from launcher_exit() when the local flag is false. T= houghts?

Regards,
Ayush
--000000000000812652064fd5c4ce-- --000000000000812653064fd5c4d0 Content-Type: application/octet-stream; name="0001-Fix-race-in-online-checksums-launcher_exit.patch" Content-Disposition: attachment; filename="0001-Fix-race-in-online-checksums-launcher_exit.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_mo66yh9l0 RnJvbSAzMzZkNWI2NzExNTdmOTc0Y2NlZWYxNTM4NWUzOTRjYTI3ZGQ1OGYyIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBBeXVzaCBUaXdhcmkgPGF5dXNodGl3YXJpLnNsZzAxQGdtYWls LmNvbT4KRGF0ZTogTW9uLCAyMCBBcHIgMjAyNiAwMDo1MjowMCArMDUzMApTdWJqZWN0OiBbQlVH XSBGaXggcmFjZSBpbiBvbmxpbmUgY2hlY2tzdW1zIGxhdW5jaGVyX2V4aXQoKQoKV2hlbiBwZ19l bmFibGVfZGF0YV9jaGVja3N1bXMoKSBpcyBjYWxsZWQgdHdpY2UgYmVmb3JlIHRoZSBmaXJzdAps YXVuY2hlciBzdGFydHMsIHR3byBsYXVuY2hlciBwcm9jZXNzZXMgYXJlIHJlZ2lzdGVyZWQuICBU aGUgc2Vjb25kCihyZWR1bmRhbnQpIGxhdW5jaGVyIGV4aXRzIGVhcmx5IGFmdGVyIHNlZWluZyBs YXVuY2hlcl9ydW5uaW5nIGlzCmFscmVhZHkgc2V0LCBidXQgaXRzIGxhdW5jaGVyX2V4aXQoKSBj YWxsYmFjayB1bmNvbmRpdGlvbmFsbHkgY2xlYXJzCnRoZSBzaGFyZWQgRGF0YUNoZWNrc3VtU3Rh dGUtPmxhdW5jaGVyX3J1bm5pbmcgZmxhZyBhbmQgbWF5IGNhbGwKU2V0RGF0YUNoZWNrc3Vtc09m ZigpLiAgVGhpcyBhbGxvd3MgYSB0aGlyZCBsYXVuY2hlciB0byBzdGFydApjb25jdXJyZW50bHkg d2l0aCB0aGUgZmlyc3QsIGFuZCBjYW4gc2lsZW50bHkgcmV2ZXJ0IHRoZSBjbHVzdGVyCmNoZWNr c3VtIHN0YXRlIHRvIG9mZiB3aGlsZSB0aGUgZmlyc3QgbGF1bmNoZXIgaXMgc3RpbGwgd29ya2lu Zy4KCkZpeCBieSByZXR1cm5pbmcgZWFybHkgZnJvbSBsYXVuY2hlcl9leGl0KCkgd2hlbiB0aGUg cHJvY2Vzcy1sb2NhbApsYXVuY2hlcl9ydW5uaW5nIGZsYWcgaXMgZmFsc2UsIGluZGljYXRpbmcg dGhpcyBwcm9jZXNzIG5ldmVyIGNsYWltZWQKdGhlIGxhdW5jaGVyIHJvbGUuCi0tLQogc3JjL2Jh Y2tlbmQvcG9zdG1hc3Rlci9kYXRhY2hlY2tzdW1fc3RhdGUuYyB8IDI1ICsrKysrKysrKysrKyst LS0tLS0tLQogMSBmaWxlIGNoYW5nZWQsIDE2IGluc2VydGlvbnMoKyksIDkgZGVsZXRpb25zKC0p CgpkaWZmIC0tZ2l0IGEvc3JjL2JhY2tlbmQvcG9zdG1hc3Rlci9kYXRhY2hlY2tzdW1fc3RhdGUu YyBiL3NyYy9iYWNrZW5kL3Bvc3RtYXN0ZXIvZGF0YWNoZWNrc3VtX3N0YXRlLmMKaW5kZXggMTg3 OTdhOGVlM2QuLjc2ZjVhYTAwZjJiIDEwMDY0NAotLS0gYS9zcmMvYmFja2VuZC9wb3N0bWFzdGVy L2RhdGFjaGVja3N1bV9zdGF0ZS5jCisrKyBiL3NyYy9iYWNrZW5kL3Bvc3RtYXN0ZXIvZGF0YWNo ZWNrc3VtX3N0YXRlLmMKQEAgLTg4NywxNyArODg3LDI0IEBAIGxhdW5jaGVyX2V4aXQoaW50IGNv ZGUsIERhdHVtIGFyZykKIHsKIAlhYm9ydF9yZXF1ZXN0ZWQgPSBmYWxzZTsKIAotCWlmIChsYXVu Y2hlcl9ydW5uaW5nKQorCS8qCisJICogT25seSBwZXJmb3JtIGNsZWFudXAgaWYgd2UgYWN0dWFs bHkgY2xhaW1lZCB0aGUgbGF1bmNoZXIgcm9sZSBieQorCSAqIHNldHRpbmcgdGhlIHNoYXJlZCBs YXVuY2hlcl9ydW5uaW5nIGZsYWcuICBBIHJlZHVuZGFudCBsYXVuY2hlciB0aGF0CisJICogZm91 bmQgYW5vdGhlciBsYXVuY2hlciBhbHJlYWR5IHJ1bm5pbmcgd2lsbCBoYXZlIGV4aXRlZCBlYXJs eSB3aXRob3V0CisJICogc2V0dGluZyB0aGUgbG9jYWwgbGF1bmNoZXJfcnVubmluZyBmbGFnLCBh bmQgbXVzdCBub3QgdG91Y2ggdGhlIHNoYXJlZAorCSAqIHN0YXRlIG93bmVkIGJ5IHRoZSBhY3Rp dmUgbGF1bmNoZXIuCisJICovCisJaWYgKCFsYXVuY2hlcl9ydW5uaW5nKQorCQlyZXR1cm47CisK KwlMV0xvY2tBY3F1aXJlKERhdGFDaGVja3N1bXNXb3JrZXJMb2NrLCBMV19FWENMVVNJVkUpOwor CWlmIChEYXRhQ2hlY2tzdW1TdGF0ZS0+d29ya2VyX3BpZCAhPSBJbnZhbGlkUGlkKQogCXsKLQkJ TFdMb2NrQWNxdWlyZShEYXRhQ2hlY2tzdW1zV29ya2VyTG9jaywgTFdfRVhDTFVTSVZFKTsKLQkJ aWYgKERhdGFDaGVja3N1bVN0YXRlLT53b3JrZXJfcGlkICE9IEludmFsaWRQaWQpCi0JCXsKLQkJ CWVyZXBvcnQoTE9HLAotCQkJCQllcnJtc2coImRhdGEgY2hlY2tzdW1zIGxhdW5jaGVyIGV4aXRp bmcgd2hpbGUgd29ya2VyIGlzIHN0aWxsIHJ1bm5pbmcsIHNpZ25hbGxpbmcgd29ya2VyIikpOwot CQkJa2lsbChEYXRhQ2hlY2tzdW1TdGF0ZS0+d29ya2VyX3BpZCwgU0lHVEVSTSk7Ci0JCX0KLQkJ TFdMb2NrUmVsZWFzZShEYXRhQ2hlY2tzdW1zV29ya2VyTG9jayk7CisJCWVyZXBvcnQoTE9HLAor CQkJCWVycm1zZygiZGF0YSBjaGVja3N1bXMgbGF1bmNoZXIgZXhpdGluZyB3aGlsZSB3b3JrZXIg aXMgc3RpbGwgcnVubmluZywgc2lnbmFsbGluZyB3b3JrZXIiKSk7CisJCWtpbGwoRGF0YUNoZWNr c3VtU3RhdGUtPndvcmtlcl9waWQsIFNJR1RFUk0pOwogCX0KKwlMV0xvY2tSZWxlYXNlKERhdGFD aGVja3N1bXNXb3JrZXJMb2NrKTsKIAogCS8qCiAJICogSWYgdGhlIGxhdW5jaGVyIGlzIGV4aXRp bmcgYmVmb3JlIGRhdGEgY2hlY2tzdW1zIGFyZSBlbmFibGVkIHRoZW4gc2V0Ci0tIAoyLjM0LjEK Cg== --000000000000812653064fd5c4d0--