Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ubw84-0075Yg-TH for pgsql-hackers@arkaria.postgresql.org; Wed, 16 Jul 2025 07:00:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ubw81-0083Hw-QD for pgsql-hackers@arkaria.postgresql.org; Wed, 16 Jul 2025 07:00:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ubw81-0083Hm-5H for pgsql-hackers@lists.postgresql.org; Wed, 16 Jul 2025 07:00:34 +0000 Received: from fhigh-a6-smtp.messagingengine.com ([103.168.172.157]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1ubw7y-0080xI-2B for pgsql-hackers@postgresql.org; Wed, 16 Jul 2025 07:00:33 +0000 Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfhigh.phl.internal (Postfix) with ESMTP id D9E5C14000BA; Wed, 16 Jul 2025 03:00:27 -0400 (EDT) Received: from phl-imap-03 ([10.202.2.93]) by phl-compute-04.internal (MEProxy); Wed, 16 Jul 2025 03:00:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=compiler.org; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1752649227; x=1752735627; bh=baJqWcgKXFb8yHoEZ0+CgMuD8gUmny4oNLQSFrbpoxQ=; b= ogqe6z5Aj7H26vtP4M96A711VY7dmcnIVkAwbo9ZMkKfrHG/VTvOSf1uI6YyF42X wSdJO6pSsErJ8eHgUIrmyqmzXmzr34vUqdgp1VXxxhPQDg4gewnuwSduaCMJMJAo GGm3g3nKK6x5Z0FzKrhQliPdu3RaEIAMXJj9wlFpVGwIIhfZFjhhscaDYu58ZuaA qOPDrTyspJz4XAWBTEfaCNUqTVY0TPYJmvbUXtV4e/9ItlfKgSvKBg3PYpb0//0E SJC/PtemuaH4ghKBxG2n8TyT3yEbAxdmEIJamcxfqOLezQgSYoEWDkxjBwq05CFz sJNQvCorxcGV4HgKgBNVMg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1752649227; x= 1752735627; bh=baJqWcgKXFb8yHoEZ0+CgMuD8gUmny4oNLQSFrbpoxQ=; b=d UK35RbVN2EgqWpnIIxDVe6dg8Z5dpTBWfGozaDwP04FaqVMFjR/KRlcqiMt0mEnP 73kolK1erpO0A1M/ZOv0t1dPjkg+wyTl7/5m/EJzg5ME1Z0BX//zmILedo/9dCMW 0cLMgo8Foe5cfcceC1Xu0pGySkp0UIa0kRp6Ih/NbV6MOMWENINUVcGqfv5n8UH4 4TEqzeCDTf3BILXjBDvXoqrpmTBNPN4WXkvbGJswtg+pxR/WmahOf7uqluvlXcCr MhIENBbldbMe9BhF+5AkiHdEIRZ4VKf7Ff5s/JpMb36Ikev7zRdaxRHC8ZflvJlk u7JkbCWKGrdKZbAygAsNQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdehjedtiecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecunecujfgurhepofggfffhvfevkfgjfhfutgfgsehtqhertd ertdejnecuhfhrohhmpedflfhovghlucflrggtohgsshhonhdfuceojhhovghlsegtohhm phhilhgvrhdrohhrgheqnecuggftrfgrthhtvghrnhepudegjeeggfetteeigfffgedtfe elffffkeeuhfeludeugfffiedtfeeltddvueefnecuffhomhgrihhnpehgihhthhhusgdr tghomhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe hjohgvlhestghomhhpihhlvghrrdhorhhgpdhnsggprhgtphhtthhopedvpdhmohguvgep shhmthhpohhuthdprhgtphhtthhopehrihhshhhurdhpohhsthhgrhgvshesghhmrghilh drtghomhdprhgtphhtthhopehpghhsqhhlqdhhrggtkhgvrhhssehpohhsthhgrhgvshhq lhdrohhrgh X-ME-Proxy: Feedback-ID: ic6394509:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id 1788518E0063; Wed, 16 Jul 2025 03:00:27 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface MIME-Version: 1.0 X-ThreadId: T90889ce1c5a9472e Date: Wed, 16 Jul 2025 09:00:05 +0200 From: "Joel Jacobson" To: "Rishu Bagga" Cc: pgsql-hackers Message-Id: In-Reply-To: References: <6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com> <165530.1752362320@sss.pgh.pa.us> <02a7cd37-e2fc-4212-8b19-f8c239c95fb8@app.fastmail.com> Subject: Re: Optimize LISTEN/NOTIFY Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote: > Hi Joel, > > Thanks for sharing the patch. > I have a few questions based on a cursory first look. > >> If a single listener is found, we signal only that backend. >> Otherwise, we fall back to the existing broadcast behavior. > > The idea of not wanting to wake up all backends makes sense to me, > but I don=E2=80=99t understand why we want this optimization only for = the case > where there is a single backend listening on a channel. > > Is there a pattern of usage in LISTEN/NOTIFY where users typically > have either just one or several backends listening on a channel? > > If we are doing this optimization, why not maintain a list of backends > for each channel, and only wake up those channels? Thanks for the thoughtful question. You've hit on the central design tra= de-off in this optimization: how to provide targeted signaling for some workloa= ds without degrading performance for others. While we don't have telemetry on real-world usage patterns of LISTEN/NOT= IFY, it seems likely that most applications fall into one of three categories, which I've been thinking of in networking terms: 1. Broadcast-style ("hub mode") Many backends listening on the *same* channel (e.g., for cache invalidat= ion). The current implementation is already well-optimized for this, behaving = like an Ethernet hub that broadcasts to all ports. Waking all listeners is ef= ficient because they all need the message. 2. Targeted notifications ("switch mode") Each backend listens on its own private channel (e.g., for session event= s or worker queues). This is where the current implementation scales poorly, = as every NOTIFY wakes up all listeners regardless of relevance. My patch is desig= ned to make this behave like an efficient Ethernet switch. 3. Selective multicast-style ("group mode") A subset of backends shares a channel, but not all. This is the tricky m= iddle ground. Your question, "why not maintain a list of backends for each cha= nnel, and only wake up those channels?" is exactly the right one to ask. A full listener list seems like the obvious path to optimizing for *all*= cases. However, the devil is in the details of concurrency and performance. Man= aging such a list would require heavier locking, which would create a new bott= leneck and degrade the scalability of LISTEN/UNLISTEN operations=E2=80=94especi= ally for the "hub mode" case where many backends rapidly subscribe to the same po= pular channel. This patch makes a deliberate architectural choice: Prioritize a massive, low-risk win for "switch mode" while rigorously pr= otecting the performance of "hub mode". It introduces a targeted fast path for single-listener channels and clea= nly falls back to the existing, well-performing broadcast model for everythi= ng else. This brings us back to "group mode", which remains an open optimization = problem. A possible approach could be to track listeners up to a small threshold = *K* (e.g., store up to 4 ProcNumber's in the hash entry). If the count excee= ds *K*, we would flip a "broadcast" flag and revert to hub-mode behavior. However, this path has a critical drawback: 1. Performance Penalty for Hub Mode With the current patch, after the second listener joins a channel, the has_multiple_listeners flag is set. Every subsequent listener can ac= quire a shared lock, see the flag is true, and immediately continue. This is a highly concurrent, read-only operation that does not require mutating = shared state. In contrast, the K-listener approach would force every new listener (fro= m the third up to the K-th) to acquire an exclusive lock to mutate the shared listener array**. This would serialize LISTEN operations on popular chan= nels, creating the very contention point this patch successfully avoids and di= rectly harming the hub-mode use case that currently works well. 2. Uncertainty Compounding this, without clear data on typical "group" sizes, choosing = a value for *K* is a shot in the dark. A small *K* might not help much, while a large *K* would increase the shared memory footprint and worsen the serialization penalty. For these reasons, attempting to build a switch that also optimizes for multicast risks undermining the architectural clarity and performance of both the switch and hub models. This patch, therefore, draws a clean line. It provides a precise, low-cost path for switch-mode workloads and preserves the existing, well-performing path for hub-mode workloads. While this leaves "group mo= de" unoptimized for now, it ensures we make two common use cases better with= out making any use case worse. The new infrastructure is flexible, leaving the door open should a better approach for "group mode" emerge in the future=E2=80=94one that doesn't compromise the other two. Benchmarks updated showing master vs 0001-optimize_listen_notify-v3.patc= h: https://github.com/joelonsql/pg-bench-listen-notify/raw/master/plot.png https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performan= ce_overview_connections_equal_jobs.png https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performan= ce_overview_fixed_connections.png I've not included the benchmark CSV data in this mail, since it's quite = heavy, 160kB, and I couldn't see any significant performance changes since v2. /Joel