Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v6L5R-0033QS-E3 for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Oct 2025 03:43:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v6L5O-006fXH-Jb for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Oct 2025 03:43:31 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v6L5O-006fX4-5U for pgsql-hackers@lists.postgresql.org; Wed, 08 Oct 2025 03:43:31 +0000 Received: from mail-pj1-x1035.google.com ([2607:f8b0:4864:20::1035]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1v6L5M-000y7k-2u for pgsql-hackers@postgresql.org; Wed, 08 Oct 2025 03:43:30 +0000 Received: by mail-pj1-x1035.google.com with SMTP id 98e67ed59e1d1-33082c95fd0so7656746a91.1 for ; Tue, 07 Oct 2025 20:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759895005; x=1760499805; darn=postgresql.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=3hLfb9N32dzOLGMIFRl8nyR0i5PHRL8qrDemP8T6Gks=; b=mX+jfYaa+hZnyipCX37tFjuoRl/Y1OazOkcZ7H+Es4uJ7pIeEnRUj1a73ut2aHiUWc Nz5oUz5BpDn60zornVFdiPJywnEERd9E9+Puv4uMkxN6k4mIQmWL7TxsU09PCWub2x7V NEgex9Stfb3B/HZa0FgYLFa9O3NB/ZRaD/kcYYdc+Ud1hbiQ69DeWI9KTHJfrpE1ZZ98 xlbfCcjR9yHS9O07ofrLvmIMpjzpk6T5qYx6lzidVI2wQRKwJ/tUrlSqilIodvfuAFkP XJW2fwo8rCwtGaxnSoL9+/g6jQeYsKWDTrdskTYuxCpw3QveP4mbKHlRG0tXQ0WVlTLv CewQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759895005; x=1760499805; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3hLfb9N32dzOLGMIFRl8nyR0i5PHRL8qrDemP8T6Gks=; b=IkQ0162Mbvwr4J5b7psAgwe2Nc00qPmS+4AJzXtUIJXKRStVWLZExbmnbDtWA2z6Qp EHxIEJT2P7C2/6EeNUz9BMdycKLl2ey1sy0fh/S9nigtA4Gnbmoq332gHLwmS+c1H3QT QPz28zziSmn6niu1/5BdBeFuCRTUGl2Zc0iMC3gCUTLBYpM+cGHfV8aQTayE5y6fReIR C+IHMNT/HHBwS23zmREjIfJmxWnOQCn9QPg68E+RkyXuno1NcPSBC8K8fc89iYwJZJNj 4joCjRfWTXCMQybNFe+uMjJchzUvazIqZbUwzqoRgl7oJNbvRnl8YTdEBWjkapzaVcn2 mG3Q== X-Forwarded-Encrypted: i=1; AJvYcCUBMcQsWCmbb9A/jlsCH/2LMt0eFdCUqdQXIoVA1yPDrpe9baQeNeylukJZ2jkBKKVgvXd3Ndia7mQZltDA@postgresql.org X-Gm-Message-State: AOJu0Yw+1+QJGPyNOirz2+gp6Dot+iSyEE8EP6sigiV4uDMOCzYBSl6o vQGxLEvQZLh8z/WgNpiIljXjQkVhwAG10Qj7HIEXIfOHpmZ5ko1DYnB+ X-Gm-Gg: ASbGncuvn7g9BPUp12uAsBOZxtVDt9IKuAtrOUt/VzAd3ZLY46TgU4RnK62+EwbFUYO 7pE8UOfdUlhEbRiZytSwd59DvbdxgAQ8/CXSttOzDHdbSrGUMR1gA0YfX/GarQkmkW6QLnPoXqH niymz24WWcQTdMDpnpMTVh+CEZDdgKeYvl/HCquSB6fjBOb75ks2mXkDj6ktP6s2RoYCebnomka gZuFUwgvi0fAfRoCNp+VTkza9ln92sv7jRRM6dDQkbOCdXBxwGJ5O1AJVcbg+kM8L3IkCyemRTR 6rQOIOKZfBVtgypwWSUX5zqmo41W9XJqyDVTNBQBHihbYCuOXX/XrESgC34t8Hdw2BniLCq/riH dJjM4YjPql5Z6UgXJfFo+4pmPbzQlHXGdRgqIG/wxtm51x3vWXyh1EWmPJA== X-Google-Smtp-Source: AGHT+IGxbY3u9lX3WApRG/t9VVrcpULmyGYxDpTN4tJPU5H5DrXSbhQeHTguSB/qdV2pw7dyPaPkgQ== X-Received: by 2002:a17:903:2b0e:b0:28e:7567:3c45 with SMTP id d9443c01a7336-2902735649emr19098805ad.9.1759895005355; Tue, 07 Oct 2025 20:43:25 -0700 (PDT) Received: from smtpclient.apple ([170.178.170.211]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-28e8d1108adsm181743975ad.21.2025.10.07.20.43.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Oct 2025 20:43:24 -0700 (PDT) From: Chao Li Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_A7B14051-948A-4A2A-A300-452E995B7FEC" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: Optimize LISTEN/NOTIFY Date: Wed, 8 Oct 2025 11:43:09 +0800 In-Reply-To: <8aeae418-92a6-4bbd-9c06-9574c79e59f7@app.fastmail.com> Cc: Tom Lane , Matheus Alcantara , pgsql-hackers To: Joel Jacobson References: <6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com> <165530.1752362320@sss.pgh.pa.us> <02a7cd37-e2fc-4212-8b19-f8c239c95fb8@app.fastmail.com> <96f00bf1-cc9d-4520-9d02-9e14e7767c88@app.fastmail.com> <30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com> <0b4d402a-9ac2-4aa8-acf8-8231dbe579ea@app.fastmail.com> <3095599.1758644879@sss.pgh.pa.us> <0dc6a2cc-5216-4dc1-9dd2-430cafc6095b@app.fastmail.com> <52CC167F-763B-4ECA-B0B4-DAB381816828@gmail.com> <9186C6D0-F7A9-482A-9183-89E530B57E36@gmail.com> <1073593.1759423179@sss.pgh.pa.us> <4bd5e6c4-6fa7-44bb-869d-59a32a331fa8@app.fastmail.com> <85828f29-e72e-4400-94f3-9a69bc8dc239@app.fastmail.com> <2495353.1759860890@sss.pgh.pa.us> <8aeae418-92a6-4bbd-9c06-9574c79e59f7@app.fastmail.com> X-Mailer: Apple Mail (2.3826.700.81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_A7B14051-948A-4A2A-A300-452E995B7FEC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 After several rounds of reviewing, the code is already very good. I just = got a few small comments: > On Oct 8, 2025, at 03:26, Joel Jacobson wrote: >=20 >=20 > /Joel 1 ``` + channels =3D GetPendingNotifyChannels(); + LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE); - for (ProcNumber i =3D QUEUE_FIRST_LISTENER; i !=3D = INVALID_PROC_NUMBER; i =3D QUEUE_NEXT_LISTENER(i)) + foreach(lc, channels) ``` I don=E2=80=99t see where =E2=80=9Cchannels=E2=80=9D is freed. = GetPendingNotifyChannels() creates a list of Nodes, both the list and = Nodes the list points to should be freed. 2 ``` + foreach(lc, channels) { - int32 pid =3D QUEUE_BACKEND_PID(i); - QueuePosition pos; + char *channel =3D strVal(lfirst(lc)); + ChannelEntry *entry; + ProcNumber *listeners; + ChannelHashKey key; =20 - Assert(pid !=3D InvalidPid); - pos =3D QUEUE_BACKEND_POS(i); - if (QUEUE_BACKEND_DBOID(i) =3D=3D MyDatabaseId) + if (channel_hash =3D=3D NULL) + entry =3D NULL; + else ``` I wonder whether or not =E2=80=9Cchannel_hash=E2=80=9D can be NULL here? = Maybe possible if a channel is un-listened while the event is pending? So, maybe add a comment here to explain the logic. 3 The same piece of code as 2. I think the code can be optimized a little bit. First, we can initialize = entry to NULL, then we don=E2=80=99t the if-else. Second, =E2=80=9Ckey=E2=80= =9D is only used for dshash_find(), so it can defined where it is used. foreach(lc, channels) { char *channel =3D strVal(lfirst(lc)); ChannelEntry *entry =3D NULL; ProcNumber *listeners; //ChannelHashKey key; if (channel_hash !=3D NULL) { ChannelHashKey key; ChannelHashPrepareKey(&key, MyDatabaseId, channel); entry =3D dshash_find(channel_hash, &key, false); } if (entry =3D=3D NULL) continue; /* No listeners registered for this = channel */ 4 ``` + if (signaled[i] || = QUEUE_BACKEND_WAKEUP_PENDING(i)) + continue; ``` I wonder if =E2=80=9Csignaled[i]=E2=80=9D is a duplicate flag of = "QUEUE_BACKEND_WAKEUP_PENDING(i)=E2=80=9D?=20 I understand signaled is local, and QUEUE_BACKEND_WAKEUP_PENDING is in = shared memory and may be set by other processes, but in local, when = signaled[I] is set, QUEUE_BACKEND_WAKEUP_PENDING(i) is also set. And = because of NotifyQueueLock, other process should not be able to cleanup = the flag. But if =E2=80=9Csignals=E2=80=9D is really needed, maybe we can use = Bitmapset (src/backend/nodes/bitmapset.c), that would use 1/8 of = memories comparing to the bool array. 5 ``` /* @@ -1865,6 +2087,7 @@ asyncQueueReadAllNotifications(void) LWLockAcquire(NotifyQueueLock, LW_SHARED); /* Assert checks that we have a valid state entry */ Assert(MyProcPid =3D=3D QUEUE_BACKEND_PID(MyProcNumber)); + QUEUE_BACKEND_WAKEUP_PENDING(MyProcNumber) =3D false; ``` This piece of code originally only read the shared memory, so it can use = LW_SHARED lock mode, but now it writes to the shared memory, do we need = to change the lock mode to =E2=80=9Cexclusive=E2=80=9D? 6 ``` +static inline void +ChannelHashPrepareKey(ChannelHashKey *key, Oid dboid, const char = *channel) +{ + memset(key, 0, sizeof(ChannelHashKey)); + key->dboid =3D dboid; + strlcpy(key->channel, channel, NAMEDATALEN); +} ``` Do we really need the memset()? If =E2=80=9Cchannel=E2=80=9D is of = length NAMEDATALEN, then it still results in a non-0 terminated = key->channel; if channel is shorter than NAMEDATALEN, strlcpy will auto = add a tailing =E2=80=98\0=E2=80=99. I think previous code should have = ensured length of channel should be less than NAMEDATALEN. 7 ``` * * Resist the temptation to make this really large. While that would = save * work in some places, it would add cost in others. In particular, = this @@ -246,6 +280,7 @@ typedef struct QueueBackendStatus Oid dboid; /* backend's = database OID, or InvalidOid */ ProcNumber nextListener; /* id of next listener, or = INVALID_PROC_NUMBER */ QueuePosition pos; /* backend has read = queue up to here */ + bool wakeup_pending; /* signal sent but not yet = processed */ } QueueBackendStatus; ``` In the same structure, rest of fields are all in camel case, I think = it=E2=80=99s better to rename the new field to =E2=80=9CwakeupPending=E2=80= =9D. 8 ``` @@ -288,11 +323,91 @@ typedef struct AsyncQueueControl ProcNumber firstListener; /* id of first listener, or * = INVALID_PROC_NUMBER */ TimestampTz lastQueueFillWarn; /* time of last queue-full msg = */ + dsa_handle channel_hash_dsa; + dshash_table_handle channel_hash_dsh; QueueBackendStatus backend[FLEXIBLE_ARRAY_MEMBER]; ``` Same as 7, but in this case, type names are not camel case, maybe okay = for field names. I don=E2=80=99t have a strong opinion here. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ --Apple-Mail=_A7B14051-948A-4A2A-A300-452E995B7FEC Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 After several = rounds of reviewing, the code is already very good. I just got a few = small comments:

On Oct 8, 2025, at 03:26, Joel Jacobson = <joel@compiler.org> wrote:


/Joel<optimize_listen_notify= -v11.patch>


1
```
+ channels =3D = GetPendingNotifyChannels();
+
  = LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
- for = (ProcNumber i =3D QUEUE_FIRST_LISTENER; i !=3D INVALID_PROC_NUMBER; i =3D = QUEUE_NEXT_LISTENER(i))
+ foreach(lc, = channels)
```

I don=E2=80=99t = see where =E2=80=9Cchannels=E2=80=9D is freed. = GetPendingNotifyChannels() creates a list of Nodes, both the list and = Nodes the list points to should be = freed.

2
```
+ = foreach(lc, channels)
  = {
- int32 = pid =3D QUEUE_BACKEND_PID(i);
- = QueuePosition pos;
+ char   = *channel =3D strVal(lfirst(lc));
+ ChannelEntry = *entry;
+ ProcNumber = *listeners;
+ ChannelHashKey = key;
 
- Assert(pid !=3D = InvalidPid);
- pos =3D = QUEUE_BACKEND_POS(i);
- if = (QUEUE_BACKEND_DBOID(i) =3D=3D MyDatabaseId)
+ = if (channel_hash =3D=3D NULL)
+ = entry =3D NULL;
+ = else
```

I wonder = whether or not =E2=80=9Cchannel_hash=E2=80=9D can be NULL here? Maybe = possible if a channel is un-listened while the event is = pending?

So, maybe add a comment here to = explain the logic.

3
The same piece = of code as 2.

I think the code can be optimized = a little bit. First, we can initialize entry to NULL, then we don=E2=80=99= t the if-else. Second, =E2=80=9Ckey=E2=80=9D is only used for = dshash_find(), so it can defined where it is = used.

foreach(lc, channels)
{
char *channel = =3D strVal(lfirst(lc));
ChannelEntry *entry =3D NULL;
ProcNumber *listeners;
= //ChannelHashKey = key;

if (channel_hash !=3D NULL)
{
ChannelHashKey key;
ChannelHashPrepareKey(&key, = MyDatabaseId, channel);
entry =3D dshash_find(channel_hash, &key, = false);
}

= if (entry =3D=3D NULL)
= continue; /* No listeners registered for this = channel = */

4
```
<= div>+ = if (signaled[i] || = QUEUE_BACKEND_WAKEUP_PENDING(i))
+ = continue;
```

I wonder = if =E2=80=9Csignaled[i]=E2=80=9D is a duplicate flag of = "QUEUE_BACKEND_WAKEUP_PENDING(i)=E2=80=9D? 

= I understand signaled is local, and QUEUE_BACKEND_WAKEUP_PENDING is in = shared memory and may be set by other processes, but in local, when = signaled[I] is set, QUEUE_BACKEND_WAKEUP_PENDING(i) is also set. And = because of NotifyQueueLock, other process should not be able to cleanup = the flag.

But if =E2=80=9Csignals=E2=80=9D is = really needed, maybe we can use Bitmapset = (src/backend/nodes/bitmapset.c), that would use 1/8 of memories = comparing to the bool = array.

5
```
 /*
@@ -1865,6 +2087,7 @@ = asyncQueueReadAllNotifications(void)
  = LWLockAcquire(NotifyQueueLock, LW_SHARED);
  /* Assert = checks that we have a valid state entry */
  = Assert(MyProcPid =3D=3D = QUEUE_BACKEND_PID(MyProcNumber));
+ = QUEUE_BACKEND_WAKEUP_PENDING(MyProcNumber) =3D = false;
```

This piece of code = originally only read the shared memory, so it can use LW_SHARED lock = mode, but now it writes to the shared memory, do we need to change the = lock mode to = =E2=80=9Cexclusive=E2=80=9D?

6
```
+static inline = void
+ChannelHashPrepareKey(ChannelHashKey *key, Oid dboid, = const char *channel)
+{
+ = memset(key, 0, sizeof(ChannelHashKey));
+ = key->dboid =3D dboid;
+ strlcpy(key->channel, channel, = NAMEDATALEN);
+}
```

Do= we really need the memset()? If =E2=80=9Cchannel=E2=80=9D is of length = NAMEDATALEN, then it still results in a non-0 terminated = key->channel; if channel is shorter than NAMEDATALEN, strlcpy will = auto add a tailing =E2=80=98\0=E2=80=99. I think previous code should = have ensured length of channel should be less than = NAMEDATALEN.

7
```
 = ; *
  * Resist the temptation to make this really large. =  While that would save
  * work in some places, it = would add cost in others.  In particular, this
@@ -246,6 = +280,7 @@ typedef struct QueueBackendStatus
  Oid = dboid; = /* backend's database OID, or InvalidOid = */
  ProcNumber = nextListener; /* id of next listener, or = INVALID_PROC_NUMBER */
  QueuePosition pos; = /* backend has read queue up to here */
+ bool = wakeup_pending; /* signal sent but not yet processed = */
 } = QueueBackendStatus;
```

In the = same structure, rest of fields are all in camel case, I think it=E2=80=99s= better to rename the new field to = =E2=80=9CwakeupPending=E2=80=9D.

8
```<= /div>
@@ -288,11 +323,91 @@ typedef struct = AsyncQueueControl
  ProcNumber = firstListener; /* id of first listener, = or
  = * INVALID_PROC_NUMBER */
  = TimestampTz lastQueueFillWarn; /* time of last queue-full msg = */
+ = dsa_handle= channel_hash_dsa;
+ dshash_table_handle = channel_hash_dsh;
  QueueBackendStatus = backend[FLEXIBLE_ARRAY_MEMBER];
```

Same as 7, but in this case, type names are not camel case, maybe = okay for field names. I don=E2=80=99t have a strong opinion = here.

Best regards,
--
Chao Li (Evan)
HighGo Software = Co., = Ltd.
https://www.highgo.com/

= --Apple-Mail=_A7B14051-948A-4A2A-A300-452E995B7FEC--