Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v6VYM-005fkZ-Gf for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Oct 2025 14:54:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v6VYK-00AsNb-7U for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Oct 2025 14:54:05 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v6VYJ-00AsNJ-Q3 for pgsql-hackers@lists.postgresql.org; Wed, 08 Oct 2025 14:54:04 +0000 Received: from flow-a8-smtp.messagingengine.com ([103.168.172.143]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1v6VYH-000gd7-2M for pgsql-hackers@postgresql.org; Wed, 08 Oct 2025 14:54:03 +0000 Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailflow.phl.internal (Postfix) with ESMTP id 5810313800C7; Wed, 8 Oct 2025 10:54:01 -0400 (EDT) Received: from phl-imap-03 ([10.202.2.93]) by phl-compute-04.internal (MEProxy); Wed, 08 Oct 2025 10:54:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=compiler.org; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1759935241; x=1759942441; bh=PSICu79Vy7nRnRSQsW+xN70SLR9P3xwN7cFmAYXo/Z4=; b= OacqT6m88Qh+htcL7aEtG2DUzVfWRxo3vuJzVJc9fPxSSYP5DqNxsNDk1BlkdfZD OTf+gLtruZ17tXl0IHynvavZHpb6XCaTgHAvNmnUuPAhrVEZK6oi+D+BTWYkEUlo N9MPcD0nuCj98Ar9UCWJtnQbEkXUsP0LYV1GiS/WtiaJYl0QJ3mOFUoHF8yPu5P5 IIsH1YhVGlacVVXx1DGrxxZ31YCjf1+HT54uiffiPOr1hSmcMOHN1Ck9qbGzvsit 08gJgBs9pNSLy3O2q5B2TDMSQzTo1BeAgnrfWKxtdjoknWHvs4FXmzaBqnGg66za 2OfE2SAdaRbeHPBNdIqFXw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1759935241; x= 1759942441; bh=PSICu79Vy7nRnRSQsW+xN70SLR9P3xwN7cFmAYXo/Z4=; b=i 0C24AsftDwz/m9nGVpzQopaqmT8GW4CrZw735dp/WF/qkF5uZ1+bG7xDKzNv19XW p6hknXP65f2bBvCME7iZom7QtP+yueLSuNZ/4jxBHGR38ANUB/xn7A14sETSKNMg 999xHH44R445jAKjDGd+Uu7R+Ui5UcFapB8l2kt6zZ0HOgXq1kq3URH5rHwwohWZ YAQAUI0w9AwGoQRvLdhI9RuONqS4+2gkcvg0A86iSS3nHd67q8vqhLA4vmx2v8YS U+ceSCxMsBg2+63FoO5BFd1xaLp5XfQWJaY+B/yWQsUo29VuPmL9HJEOyOt03ywN FDRj+5laWoaR2JJsYrtXg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggddutdefheelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefoggffhffvvefkjghfufgtgfesthhqre dtredtjeenucfhrhhomhepfdflohgvlhculfgrtghosghsohhnfdcuoehjohgvlhestgho mhhpihhlvghrrdhorhhgqeenucggtffrrghtthgvrhhnpefggfeuheefheeufeegkeeife ffhffgvefftdduuefhfeelteegleehhfdtieevgfenucevlhhushhtvghrufhiiigvpedt necurfgrrhgrmhepmhgrihhlfhhrohhmpehjohgvlhestghomhhpihhlvghrrdhorhhgpd hnsggprhgtphhtthhopeegpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehlihdr vghvrghnrdgthhgrohesghhmrghilhdrtghomhdprhgtphhtthhopehmrghthhgvuhhssh hsihhlvheljeesghhmrghilhdrtghomhdprhgtphhtthhopehpghhsqhhlqdhhrggtkhgv rhhssehpohhsthhgrhgvshhqlhdrohhrghdprhgtphhtthhopehtghhlsehsshhsrdhpgh hhrdhprgdruhhs X-ME-Proxy: Feedback-ID: ic6394509:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id E952718E0054; Wed, 8 Oct 2025 10:54:00 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface MIME-Version: 1.0 X-ThreadId: AZ7XRsRLdGms Date: Wed, 08 Oct 2025 16:53:33 +0200 From: "Joel Jacobson" To: "Chao Li" Cc: "Tom Lane" , "Matheus Alcantara" , pgsql-hackers Message-Id: <2790345f-03c3-4cae-8f14-886ed9079319@app.fastmail.com> In-Reply-To: References: <6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com> <165530.1752362320@sss.pgh.pa.us> <02a7cd37-e2fc-4212-8b19-f8c239c95fb8@app.fastmail.com> <96f00bf1-cc9d-4520-9d02-9e14e7767c88@app.fastmail.com> <30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com> <0b4d402a-9ac2-4aa8-acf8-8231dbe579ea@app.fastmail.com> <3095599.1758644879@sss.pgh.pa.us> <0dc6a2cc-5216-4dc1-9dd2-430cafc6095b@app.fastmail.com> <52CC167F-763B-4ECA-B0B4-DAB381816828@gmail.com> <9186C6D0-F7A9-482A-9183-89E530B57E36@gmail.com> <1073593.1759423179@sss.pgh.pa.us> <4bd5e6c4-6fa7-44bb-869d-59a32a331fa8@app.fastmail.com> <85828f29-e72e-4400-94f3-9a69bc8dc239@app.fastmail.com> <2495353.1759860890@sss.pgh.pa.us> <8aeae418-92a6-4bbd-9c06-9574c79e59f7@app.fastmail.com> Subject: Re: Optimize LISTEN/NOTIFY Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Oct 8, 2025, at 05:43, Chao Li wrote: > After several rounds of reviewing, the code is already very good. I=20 > just got a few small comments: Thanks for feedback! The below changes have been incorporated into the v12 version sent in my previous email. >> On Oct 8, 2025, at 03:26, Joel Jacobson wrote: >>=20 >>=20 >> /Joel > > > 1 > ``` > + channels =3D GetPendingNotifyChannels(); > + > LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE); > - for (ProcNumber i =3D QUEUE_FIRST_LISTENER; i !=3D INVALID_PROC_NUMB= ER; i=20 > =3D QUEUE_NEXT_LISTENER(i)) > + foreach(lc, channels) > ``` > > I don=E2=80=99t see where =E2=80=9Cchannels=E2=80=9D is freed. GetPend= ingNotifyChannels()=20 > creates a list of Nodes, both the list and Nodes the list points to=20 > should be freed. Per suggestion from Tom Lane I reverted back GetPendingNotifyChannels(), so this comment is not applicable any longer. > 2 > ``` > + foreach(lc, channels) > { > - int32 pid =3D QUEUE_BACKEND_PID(i); > - QueuePosition pos; > + char *channel =3D strVal(lfirst(lc)); > + ChannelEntry *entry; > + ProcNumber *listeners; > + ChannelHashKey key; >=20 > - Assert(pid !=3D InvalidPid); > - pos =3D QUEUE_BACKEND_POS(i); > - if (QUEUE_BACKEND_DBOID(i) =3D=3D MyDatabaseId) > + if (channel_hash =3D=3D NULL) > + entry =3D NULL; > + else > ``` > > I wonder whether or not =E2=80=9Cchannel_hash=E2=80=9D can be NULL her= e? Maybe possible=20 > if a channel is un-listened while the event is pending? Yes, I think channelHash can be NULL here if doing a NOTIFY when there hasn't been a LISTEN yet. > So, maybe add a comment here to explain the logic. Not sure I think that's necessary. What do you suggest that comment would say? > 3 > The same piece of code as 2. > > I think the code can be optimized a little bit. First, we can=20 > initialize entry to NULL, then we don=E2=80=99t the if-else. Second, =E2= =80=9Ckey=E2=80=9D is=20 > only used for dshash_find(), so it can defined where it is used. > > foreach(lc, channels) > { > char *channel =3D strVal(lfirst(lc)); > ChannelEntry *entry =3D NULL; > ProcNumber *listeners; > //ChannelHashKey key; > > if (channel_hash !=3D NULL) > { > ChannelHashKey key; > ChannelHashPrepareKey(&key, MyDatabaseId, channel); > entry =3D dshash_find(channel_hash, &key, false); > } > > if (entry =3D=3D NULL) > continue; /* No listeners registered for this channel */ Nice, I agree that's more readable, I changed it like that. > 4 > ``` > + if (signaled[i] || QUEUE_BACKEND_WAKEUP_PENDING(i)) > + continue; > ``` > > I wonder if =E2=80=9Csignaled[i]=E2=80=9D is a duplicate flag of=20 > "QUEUE_BACKEND_WAKEUP_PENDING(i)=E2=80=9D?=20 > > I understand signaled is local, and QUEUE_BACKEND_WAKEUP_PENDING is in=20 > shared memory and may be set by other processes, but in local, when=20 > signaled[I] is set, QUEUE_BACKEND_WAKEUP_PENDING(i) is also set. And=20 > because of NotifyQueueLock, other process should not be able to cleanu= p=20 > the flag. > > But if =E2=80=9Csignals=E2=80=9D is really needed, maybe we can use Bi= tmapset=20 > (src/backend/nodes/bitmapset.c), that would use 1/8 of memories=20 > comparing to the bool array. I agree, since we're holding an exclusive lock, the signaled array is re= undant. I've removed it, so that we rely only on the wakeupPending flag. > 5 > ``` > /* > @@ -1865,6 +2087,7 @@ asyncQueueReadAllNotifications(void) > LWLockAcquire(NotifyQueueLock, LW_SHARED); > /* Assert checks that we have a valid state entry */ > Assert(MyProcPid =3D=3D QUEUE_BACKEND_PID(MyProcNumber)); > + QUEUE_BACKEND_WAKEUP_PENDING(MyProcNumber) =3D false; > ``` > > This piece of code originally only read the shared memory, so it can=20 > use LW_SHARED lock mode, but now it writes to the shared memory, do we=20 > need to change the lock mode to =E2=80=9Cexclusive=E2=80=9D? No, LW_SHARED is sufficient here, since the backend only modifies its ow= n state, and no other backend could do that, without holding an exclusive lock. > 6 > ``` > +static inline void > +ChannelHashPrepareKey(ChannelHashKey *key, Oid dboid, const char *cha= nnel) > +{ > + memset(key, 0, sizeof(ChannelHashKey)); > + key->dboid =3D dboid; > + strlcpy(key->channel, channel, NAMEDATALEN); > +} > ``` > > Do we really need the memset()? If =E2=80=9Cchannel=E2=80=9D is of len= gth NAMEDATALEN,=20 > then it still results in a non-0 terminated key->channel; if channel i= s=20 > shorter than NAMEDATALEN, strlcpy will auto add a tailing =E2=80=98\0=E2= =80=99. I think=20 > previous code should have ensured length of channel should be less tha= n=20 > NAMEDATALEN. Yes, I think we need memset, since I fear that when the hash table keys are compared, every byte of the struct might be inspected, so without zero-initializing it, there could be unused bytes after the null terminator, that could then cause logically identical keys to be wrongly considered different. I haven't checked the implementation though, but my gut feeling says it's better to be a bit paranoid here. > 7 > ``` > * > * Resist the temptation to make this really large. While that would= save > * work in some places, it would add cost in others. In particular, = this > @@ -246,6 +280,7 @@ typedef struct QueueBackendStatus > Oid dboid; /* backend's database OID, or InvalidOid */ > ProcNumber nextListener; /* id of next listener, or INVALID_PROC_NUMB= ER */ > QueuePosition pos; /* backend has read queue up to here */ > + bool wakeup_pending; /* signal sent but not yet processed */ > } QueueBackendStatus; > ``` > > In the same structure, rest of fields are all in camel case, I think=20 > it=E2=80=99s better to rename the new field to =E2=80=9CwakeupPending=E2= =80=9D. > > 8 > ``` > @@ -288,11 +323,91 @@ typedef struct AsyncQueueControl > ProcNumber firstListener; /* id of first listener, or > * INVALID_PROC_NUMBER */ > TimestampTz lastQueueFillWarn; /* time of last queue-full msg */ > + dsa_handle channel_hash_dsa; > + dshash_table_handle channel_hash_dsh; > QueueBackendStatus backend[FLEXIBLE_ARRAY_MEMBER]; > ``` > > Same as 7, but in this case, type names are not camel case, maybe okay=20 > for field names. I don=E2=80=99t have a strong opinion here. I've did a major renaming of all new code, to better match the casing st= yle. It seems like helper functions and fields areNamedLikeThis, while API-functions AreNamedLikeThis. If we don't like this naming, I'm happy to change it again, please advis= e. /Joel