Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vAA0N-00Epcf-4w for pgsql-hackers@arkaria.postgresql.org; Sat, 18 Oct 2025 16:42:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vAA0L-008Oni-IL for pgsql-hackers@arkaria.postgresql.org; Sat, 18 Oct 2025 16:42:04 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vAA0L-008OnZ-59 for pgsql-hackers@lists.postgresql.org; Sat, 18 Oct 2025 16:42:04 +0000 Received: from mail-yw1-x112c.google.com ([2607:f8b0:4864:20::112c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vAA0H-002vCW-2G for pgsql-hackers@postgresql.org; Sat, 18 Oct 2025 16:42:03 +0000 Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-7501c24a731so36041147b3.3 for ; Sat, 18 Oct 2025 09:42:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760805718; x=1761410518; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TYdICoNFWEv4XeOb/hRJfMMGAfhX+dHlTQGyzXdqWqw=; b=QyH4+z7OfkZHZO+0fZqI/uaUa6X8UUsCW6ieKzaecSgbtLpxjzWljZTFL1DR+EMKKG BJc7i6Pinla0B+2Agl9BrX+3Slp+sBNT2vON+R+vap29NkwE77h4TKN3OijOMqrjDyu0 2nzmE5CvFGKnQtGjryVawojwDQmuphMRVL0635wuvfasHNzqBSH5Srgdo2Jc9s75/8Wi DkmTlC9AOUA4JkvOVwG1hI4Y492b5lzt8QYUjFc8aJTxdzp0OrMMw2y7QTh7TdC77Wr/ aRrtxeZPJ5WOJm1u5J/8eRhzetsMYHqBDyFRLmQXC/nXvHJqHtPhtWSy3oN82IPwHWtj I1MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760805718; x=1761410518; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TYdICoNFWEv4XeOb/hRJfMMGAfhX+dHlTQGyzXdqWqw=; b=Y2HAcxEbMDW+pEfdfKRDNxtjLSTW7uHTMb527Fxg5x5VmiPEsAWz4wk3apUBjJKrqO 3GGheUcurRXtFtc1187Nhjb8gyof7debhbVFlFWlSQ+GBgCxB1pvAgJ/IbbZTesA5/X2 igGp95VyMSRUXDJsYsxukDEfhKCuRc2VQhkp/VYJK7pDlUulttN82ASWQtjl8NtNfocC +rwnZ7b/i8G7OU4V65SalXYaG+nTUHa+H2CxFiw/hta19hIS61/hkzBM2ohhi4OcrFw9 lT9GKc/gFZiOM2ziQnjvwyCTUGGZrrIPXI5NDOoK4yh30vt3fYBAo2Sfjc/01kjX0mi+ jFlg== X-Forwarded-Encrypted: i=1; AJvYcCWJv56OQYJlZbPAN2rJ3LitMAEikYUtX1EwvCqyf82B1KkdN5FvlaYc/JQrs+EvYsuLaGzKt4GsPx52vVYz@postgresql.org X-Gm-Message-State: AOJu0Yzi7MlOniaEwzhVK5HWQ9EE9mUbFvOQm+QN2y4SVHORTvgp0fVB TVZrbPJBO1GREbVVwLHc+EF1hRIfww1qqz/oFARc2ts0Y17LWOJfGKYI0xU4eKWGGFxlrBQNx7p NAzBh7wXV78832H2YC/WHIHlZKwYmQj4= X-Gm-Gg: ASbGncuanXBlK/TGj9lXiDnyM/AD4eZErTxOtSgPFls8Y+w61oyg4PXnhjqV3LTSVnb GW/7bOn4BexEdcDNsqprWVfTmhFkYrkOV2yMFajKL05/kmtOJjSDQTxDrD5Jdz0J3gz4IeWjFHE EJB1Qa0sJGuP3ZGL1zoEVC4HxK54JoVFKAKV/8AW8SGkh5R4N/zlCRgZ0zb6WsNHObD2bFlo48+ 2sa6IMWB7BCuR+3lazvqiwLj1afWuQuAxZg4h67/JYbsBr6J1fY+A+q2Loa4rlJ8cYdUyc= X-Google-Smtp-Source: AGHT+IHelV5wpK0TTEiXHXT4mD9bblxJqkYeFt4bSvQjmv642TjtWCGGzsgDrcsZs8Sg7Bt/UTvC4Vrwl+oFJN1JuhY= X-Received: by 2002:a05:690e:134d:b0:63e:17d8:d98d with SMTP id 956f58d0204a3-63e17d8db63mr5923226d50.69.1760805717899; Sat, 18 Oct 2025 09:41:57 -0700 (PDT) MIME-Version: 1.0 References: <6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com> <165530.1752362320@sss.pgh.pa.us> <02a7cd37-e2fc-4212-8b19-f8c239c95fb8@app.fastmail.com> <96f00bf1-cc9d-4520-9d02-9e14e7767c88@app.fastmail.com> <30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com> <0b4d402a-9ac2-4aa8-acf8-8231dbe579ea@app.fastmail.com> <3095599.1758644879@sss.pgh.pa.us> <0dc6a2cc-5216-4dc1-9dd2-430cafc6095b@app.fastmail.com> <52CC167F-763B-4ECA-B0B4-DAB381816828@gmail.com> <9186C6D0-F7A9-482A-9183-89E530B57E36@gmail.com> <1073593.1759423179@sss.pgh.pa.us> <4bd5e6c4-6fa7-44bb-869d-59a32a331fa8@app.fastmail.com> <85828f29-e72e-4400-94f3-9a69bc8dc239@app.fastmail.com> <2495353.1759860890@sss.pgh.pa.us> <8aeae418-92a6-4bbd-9c06-9574c79e59f7@app.fastmail.com> <2531672.1759868124@sss.pgh.pa.us> <474efa78-337c-41cd-a73a-f845a0115109@app.fastmail.com> <2749343.1759949176@sss.pgh.pa.us> <8bfca2be-1ec0-4e15-aafb-0b7b661fe936@app.fastmail.com> <9eba307f-f2fb-48f0-9507-2e197f39ef9e@app.fastmail.com> <8c71183a-0d28-4bcf-a806-78446ff95404@app.fastmail.com> <1009807.1760476747@sss.pgh.pa.us> <1F7227F5-C33D-4E2C-8511-33F1468590D0@gmail.com> <0a5a20d3-4621-46b3-b2ab-903f63a20dea@app.fastmail.com> <1547585.1760645808@sss.pgh.pa.us> In-Reply-To: <1547585.1760645808@sss.pgh.pa.us> From: Arseniy Mukhin Date: Sat, 18 Oct 2025 19:41:46 +0300 X-Gm-Features: AS18NWARE3s0igAdxNa0Hw8ka8gva6HtiTw1L-ZNeeQOXrRHFIr0XS8MT4IfljQ Message-ID: Subject: Re: Optimize LISTEN/NOTIFY To: Tom Lane Cc: Joel Jacobson , Chao Li , pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Oct 16, 2025 at 12:39=E2=80=AFPM Joel Jacobson = wrote: > > On Wed, Oct 15, 2025, at 16:16, Tom Lane wrote: > > Arseniy Mukhin writes: > >> I think "Direct advancement" is a good idea. But the way it's > >> implemented now has a concurrency bug. Listeners store its current > >> position in the local variable 'pos' during the reading in > >> asyncQueueReadAllNotifications() and don't hold NotifyQueueLock. It > >> means that some notifier can directly advance the listener's position > >> while the listener has an old value in the local variable. The same > >> time we use listener positions to find out the limit we can truncate > >> the queue in asyncQueueAdvanceTail(). > > > > Good catch! > > I've implemented the three ideas presented below, attached as .txt files > that are diffs on top of v19, which has these changes since v17: > Thank you for the new version and all implementations! > 0002-optimize_listen_notify-v19.patch: > * Improve wording of top comment per request from Chao Li. > * Add initChannelHash call to top of SignalBackends, > to fix bug reported by Arseniy Mukhin. > > > I think we can perhaps salvage the idea if we invent a separate > > "advisory" queue position field, which tells its backend "hey, > > you could skip as far as here if you want", but is not used for > > purposes of SLRU truncation. > > Above idea is implemented in 0002-optimize_listen_notify-v19-alt1.txt pos =3D QUEUE_BACKEND_POS(i); /* Direct advancement for idle backends at the old head */ if (pendingNotifies !=3D NULL && QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite)) { QUEUE_BACKEND_ADVISORY_POS(i) =3D queueHeadAfterWrite; If we have several notifying backends, it looks like only the first one will be able to do direct advancement here. Next notifying backend will fail on QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite) as we don't wake up the listener and pos will be the same as it was for the first notifying backend. It seems that to accumulate direct advancement from several notifying backends we need to compare queueHeadBeforeWrite with advisoryPos here. And we also need to advance advisoryPos to the listener's position after reading if advisoryPos falls behind. Minute of brainstorming I also thought about a workload that probably frequently can be met. Let's say we have sequence of notifications: F F F T F F F T F F F T Here F - notification from the channel we don't care about and T - the oppo= site. It seems that after the first 'T' notification it will be more difficult for notifying backends to do 'direct advancement' as there will be some lag before the listener reads the notification and advances its position. Not sure if it's a problem, probably it depends on the intensity of notifications. But maybe we can use a bit more sophisticated data structure here? Something like a list of skip ranges. Every entry in the list is the range (pos1, pos2) that the listener can skip during the reading. So instead of advancing advisoryPos every notifying backend should add skip range to the list. Notifying backends can merge neighbour ranges (pos1, pos2) & (pos2, pos3) -> (pos1, pos3). We also can limit the number of entries to 5 for example. Listeners on their side should clear the list before reading and skip all ranges from it. What do you think? Is it overkill? > > > Alternatively, split the queue pos > > into "this is where to read next" and "this is as much as I'm > > definitively done with", where the second field gets advanced at > > the end of asyncQueueReadAllNotifications. Not sure which > > view would be less confusing (in the end I guess they're nearly > > the same thing, differently explained). > > Above idea is implemented in 0002-optimize_listen_notify-v19-alt2.txt > IMHO it's a little bit more confusing than the first option. Two points I noticed: 1) We have a fast path in asyncQueueReadAllNotifications() if (QUEUE_POS_EQUAL(pos, head)) { /* Nothing to do, we have read all notifications already. */ return; } Should we update donePos here? It looks like donePos may never be updated without it. 2) In SignalBackends() /* Signal backends that have fallen too far behind */ lag =3D asyncQueuePageDiff(QUEUE_POS_PAGE(QUEUE_HEAD), QUEUE_POS_PAGE(pos)); if (lag >=3D QUEUE_CLEANUP_DELAY) { pid =3D QUEUE_BACKEND_PID(i); Assert(pid !=3D InvalidPid); QUEUE_BACKEND_WAKEUP_PENDING(i) =3D true; pids[count] =3D pid; procnos[count] =3D i; count++; } Should we use donePos here as it is responsible for queue truncation now? > > A different line of thought could be to get rid of > > asyncQueueReadAllNotifications's optimization of moving the > > queue pos only once, per > > > > * (We could alternatively retake NotifyQueueLock and move the po= sition > > * before handling each individual message, but that seems like t= oo much > > * lock traffic.) > > > > Since we only need shared lock to advance our own queue pos, > > maybe that wouldn't be too awful. Not sure. > > Above idea is implemented in 0002-optimize_listen_notify-v19-alt3.txt > Hmm, it seems we still have the race when in the beginning of asyncQueueReadAllNotifications we read pos into the local variable and release the lock. IIUC to avoid the race without introducing another field here, the listener needs to hold the lock until it updates its position so that the notifying backend cannot change it concurrently. Best regards, Arseniy Mukhin