Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vMBF9-0035Aa-11 for pgsql-hackers@arkaria.postgresql.org; Thu, 20 Nov 2025 20:27:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vMBF7-003kuO-32 for pgsql-hackers@arkaria.postgresql.org; Thu, 20 Nov 2025 20:27:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vMBF7-003kuG-24 for pgsql-hackers@lists.postgresql.org; Thu, 20 Nov 2025 20:27:01 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vMBF5-000a7Y-0x for pgsql-hackers@postgresql.org; Thu, 20 Nov 2025 20:27:01 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 5AKKQuH02067240; Thu, 20 Nov 2025 15:26:56 -0500 From: Tom Lane To: "Joel Jacobson" cc: "Chao Li" , pgsql-hackers Subject: Re: Optimize LISTEN/NOTIFY In-reply-to: References: <6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com> <165530.1752362320@sss.pgh.pa.us> <02a7cd37-e2fc-4212-8b19-f8c239c95fb8@app.fastmail.com> <96f00bf1-cc9d-4520-9d02-9e14e7767c88@app.fastmail.com> <30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com> <0b4d402a-9ac2-4aa8-acf8-8231dbe579ea@app.fastmail.com> <3095599.1758644879@sss.pgh.pa.us> <0dc6a2cc-5216-4dc1-9dd2-430cafc6095b@app.fastmail.com> <52CC167F-763B-4ECA-B0B4-DAB381816828@gmail.com> <9186C6D0-F7A9-482A-9183-89E530B57E36@gmail.com> <1073593.1759423179@sss.pgh.pa.us> <4bd5e6c4-6fa7-44bb-869d-59a32a331fa8@app.fastmail.com> <85828f29-e72e-4400-94f3-9a69bc8dc239@app.fastmail.com> <2495353.1759860890@sss.pgh.pa.us> <8aeae418-92a6-4bbd-9c06-9574c79e59f7@app.fastmail.com> <2531672.1759868124@sss.pgh.pa.us> <474efa78-337c-41cd-a73a-f845a0115109@app.fastmail.com> <2749343.1759949176@sss.pgh.pa.us> <8bfca2be-1ec0-4e15-aafb-0b7b661fe936@app.fastmail.com> <9eba307f-f2fb-48f0-9507-2e197f39ef9e@app.fastmail.com> <8c71183a-0d28-4bcf-a806-78446ff95404@app.fastmail.com> <1009807.1760476747@sss.pgh.pa.us> <1F7227F5-C33D-4E2C-8511-33F1468590D0@gmail.com> <0a5a20d3-4621-46b3-b2ab-903f63a20dea@app.fastmail.com> <6F913129-ABEF-4004-AAF3-F22FC3! 4!29AE8@gmail.com> <1547585.1760645808@sss.pgh.pa.us> <14865EB6-0BF4-462B-9072-10BDAC10C0 Comments: In-reply-to "Joel Jacobson" message dated "Wed, 19 Nov 2025 04:14:44 +0100" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2067238.1763670416.1@sss.pgh.pa.us> Date: Thu, 20 Nov 2025 15:26:56 -0500 Message-ID: <2067239.1763670416@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk I took a brief look through the v28 patch, and I'm fairly distressed at how much new logic has been stuffed into what's effectively a critical section. It's totally not okay for AtCommit_Notify or anything it calls to throw an error; if it does, something like this will happen: diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index e1cf659..ece820d 100644 *** a/src/backend/commands/async.c --- b/src/backend/commands/async.c *************** Exec_ListenCommit(const char *channel) *** 1148,1153 **** --- 1148,1154 ---- * doesn't seem worth trying to guard against that, but maybe improve this * later. */ + elog(ERROR, "phony OOM in Exec_ListenCommit"); oldcontext = MemoryContextSwitchTo(TopMemoryContext); listenChannels = lappend(listenChannels, pstrdup(channel)); MemoryContextSwitchTo(oldcontext); regression=# begin; BEGIN regression=*# listen foo; LISTEN regression=*# notify foo; NOTIFY regression=*# commit; ERROR: phony OOM in Exec_ListenCommit WARNING: AbortTransaction while in COMMIT state PANIC: cannot abort transaction 21558, it was already committed server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. (The NOTIFY in my example could be replaced by anything that causes the transaction to obtain an XID.) Now, we had skated past this issue in a few places already, such as the above-quoted fragment in Exec_ListenCommit, arguing that the probability of failure there was small enough to tolerate. But I see no such arguments being made in this patch, and I doubt I'd believe it anyway for things like DSA segment creation. So I think there needs to be a serious effort made to move as much as we possibly can of the potentially-risky stuff into PreCommit_Notify. In particular I think we ought to create the shared channel hash entry then, and even insert our PID into it. We could expand the listenersArray entries to include both a PID and a boolean "is it REALLY listening?", and then during Exec_ListenCommit we'd only be required to find an entry we already added and set its boolean, so there's no OOM hazard. Possibly do something similar with the local listenChannelsHash, so as to remove that possibility of OOM failure as well. (An alternative design could be to go ahead and insert our PID during PreCommit_Notify, and just tolerate the small risk of getting signaled when we didn't need to be. But then we'd need some mechanism for cleaning out the bogus entry during AtAbort_Notify.) I'm not sure what I think about the new logic in SignalBackends from this standpoint. Making it very-low-probability-of-error definitely needs some consideration though. regards, tom lane