public inbox for [email protected]
help / color / mirror / Atom feedFrom: Joel Jacobson <[email protected]>
To: Thomas Munro <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Rishu Bagga <[email protected]>
Subject: Re: Optimize LISTEN/NOTIFY
Date: Thu, 07 Aug 2025 02:16:05 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<CAK80=jhmE40KVqQ3ho37MArS7cAED1p9m7uikDxcnDmqdW7t8A@mail.gmail.com>
<[email protected]>
<[email protected]>
<CA+hUKGLrMGkWDB0cwTa0RqD+AF7O-Ywgck8aVYKwOQnZgYRRug@mail.gmail.com>
<[email protected]>
On Thu, Jul 24, 2025, at 23:03, Joel Jacobson wrote:
> * 0001-Optimize-LISTEN-NOTIFY-signaling-with-a-lock-free-at.patch
> * 0002-Optimize-LISTEN-NOTIFY-wakeup-by-replacing-signal-wi.patch
I'm withdrawing the latest patches, since they won't fix the scalability
problems, but only provide some performance improvements by eliminating
redundant IPC signalling. This could also be improved outside of
async.c, by optimizing ProcSignal [1] or removing ProcSignal as
"Interrupts vs Signals" [2] is working on.
There seems to be two different scalability problems, that appears to be
orthogonal:
First, it's the thundering herd problems that I tried to solve initially
in this thread, by introducing a hash table in shared memory, to keep
track of what backends listen to what channels, to avoid immediate
wakeup of all listening backends for every notification.
Second, it's the heavyweight lock in PreCommit_Notify(), that prevents
parallelism of NOTIFY. Tom Lane has an idea [3] on how to improve this.
My perf+pgbench experiments indicate that out of these two different
scalability problems, if one or the other is the bottleneck depends on
the workload.
I think the idea of keeping track of channels per backends has merit,
but I want to take a step back and see what others think about the idea first.
I guess my main question is if we think we should fix one problem first,
then the other, both at the same time, or only one or the other?
I've attached some benchmarks using pgbench and running postgres under
perf, which I hope can provide some insights.
/Joel
[1] https://www.postgresql.org/message-id/flat/a0b12a70-8200-4bd4-9e24-56796314bdce%40app.fastmail.com
[2] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2B3MkS21yK4jL4cgZywdnnGKiBg0jatoV6kzaniBmcqbQ%4...
[3] https://www.postgresql.org/message-id/1878165.1752858390%40sss.pgh.pa.us
Attachments:
[text/markdown] listen_notify_pgbench_perf.md (18.4K, 2-listen_notify_pgbench_perf.md)
download | inline diff:
# LISTEN/NOTIFY scalability benchmark
## Table of Contents
- [Overview](#overview)
- [master (b5c53b4)](#master-b5c53b4)
- [1 x NOTIFY channel_1 to 1 x LISTEN channel_1](#1-x-notify-channel_1-to-1-x-listen-channel_1)
- [100 x NOTIFY channel_1 to 1 x LISTEN channel_1](#100-x-notify-channel_1-to-1-x-listen-channel_1)
- [1 x NOTIFY channel_1 to 100 x LISTEN channel_1](#1-x-notify-channel_1-to-100-x-listen-channel_1)
- [100 x NOTIFY channel_1 to 100 x LISTEN channel_1](#100-x-notify-channel_1-to-100-x-listen-channel_1)
- [100 x NOTIFY channel_:client_id to 100 x LISTEN channel_:client_id](#100-x-notify-channel_client_id-to-100-x-listen-channel_client_id)
- [master (b5c53b4) without heavyweight lock](#master-b5c53b4-without-heavyweight-lock)
- [1 x NOTIFY channel_1 to 1 x LISTEN channel_1](#1-x-notify-channel_1-to-1-x-listen-channel_1-1)
- [100 x NOTIFY channel_1 to 1 x LISTEN channel_1](#100-x-notify-channel_1-to-1-x-listen-channel_1-1)
- [1 x NOTIFY channel_1 to 100 x LISTEN channel_1](#1-x-notify-channel_1-to-100-x-listen-channel_1-1)
- [100 x NOTIFY channel_1 to 100 x LISTEN channel_1](#100-x-notify-channel_1-to-100-x-listen-channel_1-1)
- [100 x NOTIFY channel_:client_id to 100 x LISTEN channel_:client_id](#100-x-notify-channel_client_id-to-100-x-listen-channel_client_id-1)
- [Scripts](#scripts)
## Overview
The goal of this benchmark is to get a better understanding of
how {a single, a hundred} concurrent NOTIFY backends in combination
with {a single, a hundred} concurrent LISTEN backends, affect the
pgbench tps, and using perf to understand what the bottleneck for
each workload scenario is.
The benchmark has been run on Ubuntu 24.04.2 LTS running in a UTM
virtual machine on a Apple M3 Max 128GB RAM.
In the perf results, a drill-down from PostgresMain is shown,
where the largest branch is expanded, down to the syscall,
to get an idea of what dominates.
## master (b5c53b4)
### 1 x NOTIFY channel_1 to 1 x LISTEN channel_1
```
$ ./listen_script 1
$ pgbench -f ~/notify_channel_1.sql -c 1 -j 1 -T 60 -n bench
tps = 11544.902331 (without initial connection time)
- 98.90% 0.25% postgres postgres [.] PostgresMain
- 98.64% PostgresMain
- 39.59% exec_simple_query
- 32.36% CommitTransactionCommand
- 32.20% CommitTransaction
- 23.15% AtCommit_Notify
+ 22.05% kill
+ 0.83% AllocSetAllocFromNewBlock
+ 3.74% PreCommit_Notify
+ 1.30% ResourceOwnerReleaseInternal
+ 0.67% XactLogCommitRecord
+ 2.83% pg_parse_query
+ 1.66% start_xact_command
0.53% PortalRun
+ 32.54% pq_getbyte
+ 24.39% socket_flush
0.54% SetCurrentStatementStartTimestamp
```
### 100 x NOTIFY channel_1 to 1 x LISTEN channel_1
```
$ ./listen_script 1
$ pgbench -f ~/notify_channel_1.sql -c 100 -j 100 -T 60 -n bench
tps = 7494.324353 (without initial connection time)
- 99.23% 0.21% postgres postgres [.] PostgresMain
- 99.02% PostgresMain
- 61.83% exec_simple_query
- 57.73% CommitTransactionCommand
- 57.61% CommitTransaction
- 27.40% ResourceOwnerReleaseInternal
- 27.17% ProcReleaseLocks
- 27.09% LockReleaseAll
- 21.82% ProcLockWakeup
+ 19.28% kill
0.89% LockCheckConflicts
+ 3.75% LWLockRelease
+ 14.03% AtCommit_Notify
+ 11.77% PreCommit_Notify
+ 2.31% XactLogCommitRecord
+ 1.76% pg_parse_query
+ 0.70% start_xact_command
+ 22.45% socket_flush
+ 13.29% pq_getbyte
```
### 1 x NOTIFY channel_1 to 100 x LISTEN channel_1
```
$ for n in `seq 1 100` ; do ./listen_script 1 ; done
$ pgbench -f ~/notify_channel_1.sql -c 1 -j 1 -T 60 -n bench
tps = 798.089837 (without initial connection time)
- 99.75% 0.02% postgres postgres [.] PostgresMain
- 99.73% PostgresMain
- 62.41% pq_getbyte
- pq_recvbuf
- 62.40% secure_read
- 42.07% ProcessClientReadInterrupt
- 41.90% ProcessNotifyInterrupt
- 34.08% socket_flush
- 34.03% internal_flush_buffer
- 34.02% secure_write
- 27.75% WaitEventSetWait
+ 18.20% epoll_pwait
+ 7.59% WaitEventSetWait
+ 1.79% drain
+ 5.81% __send
+ 3.02% CommitTransactionCommand
+ 2.90% asyncQueueReadAllNotifications
+ 1.44% StartTransactionCommand
+ 19.31% WaitEventSetWait
+ 0.82% recv
+ 36.48% exec_simple_query
+ 0.62% socket_flush
```
### 100 x NOTIFY channel_1 to 100 x LISTEN channel_1
```
$ for n in `seq 1 100` ; do ./listen_script 1 ; done
$ pgbench -f ~/notify_channel_1.sql -c 100 -j 100 -T 60 -n bench
tps = 1314.302478 (without initial connection time)
- 99.78% 0.02% postgres postgres [.] PostgresMain
- 99.76% PostgresMain
- 50.35% pq_getbyte
- 50.34% pq_recvbuf
- 50.34% secure_read
- 29.69% ProcessClientReadInterrupt
- 29.54% ProcessNotifyInterrupt
- 22.70% socket_flush
- 22.63% internal_flush_buffer
- 22.62% secure_write
- 17.92% WaitEventSetWait
+ 12.84% epoll_pwait
+ 3.88% WaitEventSetWait
+ 1.11% drain
+ 4.35% __send
+ 2.63% CommitTransactionCommand
+ 2.58% asyncQueueReadAllNotifications
+ 1.25% StartTransactionCommand
+ 19.25% WaitEventSetWait
+ 1.23% recv
+ 48.19% exec_simple_query
+ 0.94% socket_flush
```
### 100 x NOTIFY channel_:client_id to 100 x LISTEN channel_:client_id
```
$ for n in `seq 1 100` ; do ./listen_script $n ; done
$ pgbench -f ~/notify_channel_client_id.sql -c 100 -j 100 -T 60 -n bench
tps = 1419.322468 (without initial connection time)
- 99.81% 0.02% postgres postgres [.] PostgresMain
- 99.79% PostgresMain
- 50.42% pq_getbyte
- 50.41% pq_recvbuf
- 50.41% secure_read
- 35.41% WaitEventSetWait
+ 26.01% epoll_pwait
+ 7.12% WaitEventSetWait
+ 2.06% drain
+ 10.18% ProcessClientReadInterrupt
+ 4.39% recv
+ 48.15% exec_simple_query
+ 0.97% socket_flush
```
## master (b5c53b4) without heavyweight lock
This is just to give an idea of how the heavyweight lock affects the
scalability.
```diff
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bd37d5beb..47dfe42c9c 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -919,8 +919,6 @@ PreCommit_Notify(void)
* (Historical note: before PG 9.0, a similar lock on "database 0" was
* used by the flatfiles mechanism.)
*/
- LockSharedObject(DatabaseRelationId, InvalidOid, 0,
- AccessExclusiveLock);
/* Now push the notifications into the queue */
nextNotify = list_head(pendingNotifies->events);
```
### 1 x NOTIFY channel_1 to 1 x LISTEN channel_1
```
$ ./listen_script 1
$ pgbench -f ~/notify_channel_1.sql -c 1 -j 1 -T 60 -n bench
tps = 11645.734899 (without initial connection time)
- 99.28% 0.24% postgres postgres [.] PostgresMain
- 99.04% PostgresMain
- 39.80% exec_simple_query
- 32.54% CommitTransactionCommand
- 32.38% CommitTransaction
- 23.53% AtCommit_Notify
+ 22.41% kill
+ 0.86% AllocSetAllocFromNewBlock
+ 3.57% PreCommit_Notify
+ 1.09% ResourceOwnerReleaseInternal
+ 0.73% XactLogCommitRecord
+ 2.66% pg_parse_query
+ 1.58% start_xact_command
0.59% PortalRun
0.57% CreatePortal
+ 32.42% pq_getbyte
+ 24.52% socket_flush
0.55% SetCurrentStatementStartTimestamp
```
### 100 x NOTIFY channel_1 to 1 x LISTEN channel_1
```
$ ./listen_script 1
$ pgbench -f ~/notify_channel_1.sql -c 100 -j 100 -T 60 -n bench
tps = 121615.034209 (without initial connection time)
- 99.81% 0.20% postgres postgres [.] PostgresMain
- 99.61% PostgresMain
- 67.21% exec_simple_query
- 57.33% CommitTransactionCommand
- 57.23% CommitTransaction
- 38.04% AtCommit_Notify
+ 34.87% kill
+ 1.22% LWLockRelease
+ 0.85% AllocSetAllocFromNewBlock
+ 0.76% LWLockAcquire
+ 9.06% PreCommit_Notify
+ 2.54% ResourceOwnerReleaseInternal
+ 1.53% XactLogCommitRecord
1.44% TransactionIdSetTreeStatus
+ 0.70% GetCurrentTransactionStopTimestamp
0.57% LWLockRelease
0.54% ProcArrayEndTransaction
+ 0.53% MemoryContextResetOnly
+ 4.81% pg_parse_query
+ 1.33% start_xact_command
0.85% CreatePortal
0.54% PortalRun
+ 16.14% socket_flush
+ 14.43% pq_getbyte
```
### 1 x NOTIFY channel_1 to 100 x LISTEN channel_1
```
$ for n in `seq 1 100` ; do ./listen_script 1 ; done
$ pgbench -f ~/notify_channel_1.sql -c 1 -j 1 -T 60 -n bench
tps = 801.370038 (without initial connection time)
- 99.79% 0.02% postgres postgres [.] PostgresMain
- 99.77% PostgresMain
- 62.94% pq_getbyte
- pq_recvbuf
- 62.94% secure_read
- 42.29% ProcessClientReadInterrupt
- 42.13% ProcessNotifyInterrupt
- 33.99% socket_flush
- 33.93% internal_flush_buffer
- 33.92% secure_write
- 27.73% WaitEventSetWait
+ 18.15% epoll_pwait
+ 7.59% WaitEventSetWait
+ 1.83% drain
+ 5.75% __send
+ 3.21% CommitTransactionCommand
+ 3.00% asyncQueueReadAllNotifications
+ 1.43% StartTransactionCommand
+ 19.59% WaitEventSetWait
+ 0.86% recv
+ 36.10% exec_simple_query
```
### 100 x NOTIFY channel_1 to 100 x LISTEN channel_1
```
$ for n in `seq 1 100` ; do ./listen_script 1 ; done
$ pgbench -f ~/notify_channel_1.sql -c 100 -j 100 -T 60 -n bench
tps = 4095.709407 (without initial connection time)
- 99.79% 0.05% postgres postgres [.] PostgresMain
- 99.73% PostgresMain
- 54.22% exec_simple_query
- 52.98% CommitTransactionCommand
- 52.95% CommitTransaction
- 50.49% AtCommit_Notify
+ 49.85% kill
1.17% PreCommit_Notify
+ 43.25% pq_getbyte
+ 1.75% socket_flush
```
### 100 x NOTIFY channel_:client_id to 100 x LISTEN channel_:client_id
```
$ for n in `seq 1 100` ; do ./listen_script $n ; done
$ pgbench -f ~/notify_channel_client_id.sql -c 100 -j 100 -T 60 -n bench
tps = 3354.541290 (without initial connection time)
- 99.87% 0.03% postgres postgres [.] PostgresMain
- 99.85% PostgresMain
- 62.30% exec_simple_query
- 61.72% CommitTransactionCommand
- 61.71% CommitTransaction
- 60.24% AtCommit_Notify
+ 59.83% kill
0.80% PreCommit_Notify
+ 36.09% pq_getbyte
+ 1.21% socket_flush
```
## Scripts
The following `expect` script was used to spawn LISTEN connections,
that were kept open, and that did SELECT 1 every second,
to receive the async notifications, to make it more realistic:
`listen_script`:
```
#!/usr/bin/expect -f
set timeout -1
log_user 0 ;# suppress stdout/stderr
if {$argc != 1} {
puts stderr "Usage: $argv0 <channel>"
exit 64 ;# EX_USAGE
}
set channel [lindex $argv 0]
if {[fork] != 0} { exit }
disconnect ;# stdio → /dev/null
spawn /home/joel/pg-debug/bin/psql -q bench
sleep 1
send "LISTEN channel_$channel;\r"
proc heartbeat {} {
if {[catch {send "SELECT 1;\r"}]} { exit 2 } ;# PTY gone → exit
after 1000 heartbeat
}
after 1000 heartbeat
while 1 {
expect {
eof { exit 0 }
"You are currently not connected to a database." { exit 1 }
-re {.*\r?\n} { exp_continue }
}
}
```
`pgbench` scripts:
notify_channel_1.sql:
```
NOTIFY channel_1;
```
notify_channel_client_id.sql:
```
NOTIFY channel_:client_id;
```
[application/pdf] listen_notify_pgbench_perf.pdf (170.5K, 3-listen_notify_pgbench_perf.pdf)
download
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: Optimize LISTEN/NOTIFY
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox