Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEJG0-009ogI-QA for pgsql-hackers@arkaria.postgresql.org; Thu, 30 Oct 2025 03:23:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEJFy-005UY9-Jr for pgsql-hackers@arkaria.postgresql.org; Thu, 30 Oct 2025 03:23:21 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEJFy-005UY1-8x for pgsql-hackers@lists.postgresql.org; Thu, 30 Oct 2025 03:23:21 +0000 Received: from mail-pg1-x52c.google.com ([2607:f8b0:4864:20::52c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vEJFu-004zqj-1c for pgsql-hackers@postgresql.org; Thu, 30 Oct 2025 03:23:20 +0000 Received: by mail-pg1-x52c.google.com with SMTP id 41be03b00d2f7-b6ce806af3eso533614a12.0 for ; Wed, 29 Oct 2025 20:23:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761794596; x=1762399396; darn=postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Q2a7+dw3eVMrF3dMWaHjBswDJI13sc1/ueg1n8s21iE=; b=ayKNtEz+J958gtmJMQmSZiCkV8JtPE9wkG5l2aIqPM7/uC++ReG8cOsaDsnK5kQtpV O5XLigUw6sAcuXaCCOIAe5MC0CRtivB5OsB0bZ+ZOKXluadN7fs3bgeV7F0CDH5/M8xB fgy7akpm/fKPa8wmpcRKDgiHhdnTCE+biaSNc7ne/Oog3PC5dswTqeMTJC9eC5M++/iU z4sKPFNDJmdd11HPghAIKk3jBA7JQ11o2KCadhu0uK2/HtztUZg8ohcmJa5Dp8m8i/Qo 6tIO3e9pjhNs+hNcYXdisGp8ZvTTD67mgoFSNWIPkcShWXXxZDN5IJKWnHQ9VvQ+ZwOW xb5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761794596; x=1762399396; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q2a7+dw3eVMrF3dMWaHjBswDJI13sc1/ueg1n8s21iE=; b=Rr3f3DlDJ/Ti6PnWr/QRnJgpeXFDoJdGshKXoIvyLfqdQJm+4sUXneNZ4Y71b/oJsB iQN+WXAKrmeZ4KK/atZVKWTxonpFkeSfXvKX1Q9M0g9sLhFWFgU9h/xauZTkPMRsnUgO EziH8H0l6mR9loxe8M7kVqmiJRcTtzFBzb+4zgFqQtxpd8W25nskR3uE43bL+V+afL3x XrY/+xWfxF4Enu++zk6T9dCelfIJubqpfnRALELlQAeMT9kZJchskLbEv32Sv4jWuLVF WtHP+UhwZDjygNjGPC/O9s/33unChVuwtHhRrEAVCvWDU/HDPhN/Y0bFJeP+YfydsAKu sixQ== X-Gm-Message-State: AOJu0Yzg5USXItfcYA9oG/lbFpQhzfUTy4UeReIc4OCyg2zRmyPFDzIc EKbr4BLXHYlA0V/5FaOuJzU1gqE3yl8zFoQFuWooTRKjBjBFb3YbsrPJv87VmUOXNqc= X-Gm-Gg: ASbGnctGLMAFiBQzYH2eRFgwySCLDjbXcGfyjpHN+BISfOShzSsCi3ftYFWpO1OAIu7 1zu1V9QLkAPaMgSoFF4BkiG1KpZV3aG4QwHTxHBz5OQmizqwobYiGQ/qyhH0ARnj8QkalqoFj2X JuXJflfC5i4e9yn6D1nZ7IHmWL79NcOHQEhA4p0womI+ffN08wNWciM6Pcsfsp7aJVy231+0eqs AxaIZazwvRO/Roz+KalrUfFr2jIzqIRRMhhpd1faW3l0lpB8jQQrEc8BAKvePkmGgC0GfID/t2p kXpVJ+GE4y0e7Q8LfQYHH6DJ7YcNCWgyGsfXouV4ulL5mUR9/aZBzOR+h079YQOYNbAj8uLrqOA zxMJawhnyhvpZXWvvpaBLOzXXmSbCuwQkodo7a0PpxJHETkGOZCBgI5ULsKwKNTuk5TnOaTQ6GG ZvmTqKjv0CDpF+KCV0kZ13rg== X-Google-Smtp-Source: AGHT+IG734EwcyddljzGuIWq3iKld66mI4oF1t7ntXAsQQtZaCouYc+rrRcwAapw9Z5Y+g7EO6SCeA== X-Received: by 2002:a17:903:2a86:b0:288:e46d:b32b with SMTP id d9443c01a7336-294dee2ed04mr62773665ad.17.1761794595709; Wed, 29 Oct 2025 20:23:15 -0700 (PDT) Received: from smtpclient.apple ([170.178.170.211]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29498cf4aa5sm167145345ad.13.2025.10.29.20.23.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 29 Oct 2025 20:23:15 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: Optimize LISTEN/NOTIFY From: Chao Li In-Reply-To: <7556f0d4-03fd-451a-bd34-5f62b424319a@app.fastmail.com> Date: Thu, 30 Oct 2025 11:22:40 +0800 Cc: pgsql-hackers Content-Transfer-Encoding: quoted-printable Message-Id: <290910DE-9A03-4AE6-B348-073D5DA96ACC@gmail.com> References: <6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com> <165530.1752362320@sss.pgh.pa.us> <02a7cd37-e2fc-4212-8b19-f8c239c95fb8@app.fastmail.com> <96f00bf1-cc9d-4520-9d02-9e14e7767c88@app.fastmail.com> <30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com> <0b4d402a-9ac2-4aa8-acf8-8231dbe579ea@app.fastmail.com> <3095599.1758644879@sss.pgh.pa.us> <0dc6a2cc-5216-4dc1-9dd2-430cafc6095b@app.fastmail.com> <52CC167F-763B-4ECA-B0B4-DAB381816828@gmail.com> <9186C6D0-F7A9-482A-9183-89E530B57E36@gmail.com> <1073593.1759423179@sss.pgh.pa.us> <4bd5e6c4-6fa7-44bb-869d-59a32a331fa8@app.fastmail.com> <85828f29-e72e-4400-94f3-9a69bc8dc239@app.fastmail.com> <2495353.1759860890@sss.pgh.pa.us> <8aeae418-92a6-4bbd-9c06-9574c79e59f7@app.fastmail.com> <2531672.1759868124@sss.pgh.pa.us> <474efa78-337c-41cd-a73a-f845a0115109@app.fastmail.com> <2749343.1759949176@sss.pgh.pa.us> <8bfca2be-1ec0-4e15-aafb-0b7b661fe936@app.fastmail.com> <9eba307f-f2fb-48f0-9507-2e197f39ef9e@app.fastmail.com> <8c71183a-0d28-4bcf-a806-78446ff95404@app.fastmail.com> <1009807.1760476747@sss.pgh.pa.us> <1F7227F5-C33D-4E2C-8511-33F1468590D0@gmail.com> <0a5a20d3-4621-46b3-b2ab-903f63a20dea@app.fastmail.com> <6F913129-ABEF-4004-AAF3-F22FC34!29AE8@gmail.com> <1547585.1760645808@sss.pgh.pa.us> <14865EB6-0BF4-462B-9072-10BDAC10C052@gmail.com> <0BCA1C2D-B92C-459E-B1A6-6D06BA4C62CF@gmail.com> <55d24cbb-e9ef-491f-a99b-b3dbd7cecdf9@app.fastmail.com> <38574cad-e90d-47b7-a015-753bb6bbc360@app.fastmail.com> <66631FB7-5BEA-4ED5-A694-9AD8B9CCFEE8@gmail.com> <4b7b49a5-5e1a-44a8-93e0-60457d15cb1d@app.fastmail.com> <82DEA2B6-6FC5-4A79-BDE3-1FD72F104A6E@gmail.com> <38de1036-d8cf-420c-b845-edb5a946b191@app.fastmail.com> <87E40BF8-8877-4DBD-9040-99AF8A4E6358@gmail.com> <7556f0d4-03fd-451a-bd34-5f62b424319a@app.fastmail.com> To: Joel Jacobson X-Mailer: Apple Mail (2.3826.700.81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Oct 29, 2025, at 18:33, Joel Jacobson wrote: >=20 > On Wed, Oct 29, 2025, at 08:05, Chao Li wrote: >>=20 >> I think the current implementation still has a race problem. >>=20 >> Let=E2=80=99s say notifier N1 notifies listener=E2=80=99s L1 to read = message. >> L1 starts to read: it acquires the look, gets reading range, then=20 >> releases the lock, start performs reading without holding the lock. >> Notifier N2 comes, N2 doesn=E2=80=99t have anything L1 is interested = in. N2 now=20 >> holds the look, when it checks "if (QUEUE_POS_EQUAL(pos,=20 >> queueHeadBeforeWrite))=E2=80=9D, here comes the race. Because the = lock is in=20 >> N2=E2=80=99s hand, L1 cannot get the lock to update its pos, so "if=20= >> (QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite))=E2=80=9D will not be = satisfied, so=20 >> direct advancement won=E2=80=99t happen. >=20 > I'm not sure I agree that qualifies as a race "problem" per se, since = I > think that just sounds like a case where we would do an unnecessary > wakeup, right? >=20 Why unnecessary? Say there are 100 listeners L1 - L100. When N2 is = checking state of L1, L100 has finished reading, ideally L100 should = update its pos, then when N2 reaches L100, it should do direct = advancement, right? But now the problem is, we use a single notification lock to handle all = notifiers and listeners. Assume if every backend process has a = notification lock, then the race is no longer there. When N2 is checking = state of L1, it just holds L1=E2=80=99s lock, so L100 can go ahead = update its pos, then when N2 reaches L100, N2 can do direct advancement. I ever thought to propose to use a lock for every backend process, but I = didn=E2=80=99t, because a lock is underlying an expensive semaphore, if = there are hundreds of backends, adding the same number of semaphores = doesn=E2=80=99t seem a good thing, which would be a too many overheads = to the system. > Without more sophisticated data structures (e.g. skip ranges) and > increased code complexity, there will always be cases where we will by > do unnecessary wakeups, which IMO need not be a design goal to > completely avoid, until we have benchmark data that indicates = otherwise. >=20 The other problem I see is that, we don=E2=80=99t have a way to evaluate = if the =E2=80=9Cdirect advancement=E2=80=9D is really effective, such as = 1) if a case that can perform =E2=80=9Cdirect advancement=E2=80=9D is = really applied the advancement; 2) in a test model, how many =E2=80=9Cdire= ct advancement=E2=80=9D are applied. > I think we should iterate by first trying to reason about correctness = of > the code, trying to prove/disprove if a notifications could ever end = up > not being delivered. The bug I fixed in v22 is an example of such a > case, that would cause a listening backend to never be awaken, since > notifiers would not signal it due to the pending wake that was not > cleared. >=20 > I wonder if there could be more such serious bugs in the current code. = I > will focus my efforts now trying to answer that question. Would be > really nice if we could find a way to reason formally about this. I've > been looking into the P programming language, which seems suitable for > modeling and verifying these kind of asynchronous concurrency = protocols, > I will give it a try. >=20 I don=E2=80=99t think we need to rush. =46rom my observation, none of = the =E2=80=9Cbig=E2=80=9D patches can get merged quickly anyway. Rather = than hurrying to make it =E2=80=9Cready,=E2=80=9D I think it=E2=80=99s = better to take the time to make it =E2=80=9Cperfect=E2=80=9D. I have = also spent a lot of time on this patch, and I don=E2=80=99t mind to = spend more. If you need a hand, I will be happy to offer. TBH, with all the problems I described earlier still in my brain, I just = cannot convince myself to let this patch go yet. Sorry about that. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/