Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUC2A-0010CV-1J for pgsql-bugs@arkaria.postgresql.org; Mon, 01 Jun 2026 23:27:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wUC29-00BTcU-0n for pgsql-bugs@arkaria.postgresql.org; Mon, 01 Jun 2026 23:27:01 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUC29-00BTcL-00 for pgsql-bugs@lists.postgresql.org; Mon, 01 Jun 2026 23:27:01 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wUC22-00000000kWO-2cbU for pgsql-bugs@lists.postgresql.org; Mon, 01 Jun 2026 23:27:00 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.18.1/8.18.1) with ESMTP id 651NQpmD2868593; Mon, 1 Jun 2026 19:26:51 -0400 From: Tom Lane To: "Matheus Alcantara" cc: adoros@starfishstorage.com, pgsql-bugs@lists.postgresql.org Subject: Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct In-reply-to: References: <19480-f1f9fdce30462fc4@postgresql.org> <982975.1779981146@sss.pgh.pa.us> Comments: In-reply-to "Matheus Alcantara" message dated "Mon, 01 Jun 2026 19:14:34 -0300" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2868591.1780356411.1@sss.pgh.pa.us> Date: Mon, 01 Jun 2026 19:26:51 -0400 Message-ID: <2868592.1780356411@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk "Matheus Alcantara" writes: > On Thu May 28, 2026 at 12:12 PM -03, Tom Lane wrote: >> Yeah. The bigger picture though is: if we are re-entrantly calling >> either a recursive function or a SRF, we should not destroy any of the >> existing state, nor do we want to replace the function body. The only >> way to have sane behavior is to keep executing the same function body >> until the execution instance (recursion level or continued SRF) is >> done. So these concerns about associated state are only part of the >> problem. > I've been exploring the funccache.c approach for plpython. The main > challenge is that plpython uses SFRM_ValuePerCall for SRFs, whereas > plpgsql uses SFRM_Materialize. This means plpgsql can simply increment > use_count at the start of plpgsql_call_handler() and decrement it at the > end, since all results are produced in a single call. For plpython, > ExecMakeTableFunctionResult() calls the handler multiple times, with > use_count returning to zero between calls. Right. I think what we have to do is maintain the increased use_count across the whole series of SRF executions and decrement it only once we're done. That implies that we need some out-of-band mechanism for decrementing the use_count if the query fails to run the SRF to completion for whatever reason (error, LIMIT, etc). The first tool I would reach for is a context reset callback attached to the query's executor context, but there may be a better answer. Whether we do it like that or some other way, it might be appropriate to put infrastructure for it into funccache.c instead of expecting every PL that wants to use SFRM_ValuePerCall to re-invent this wheel. > I'm still not sure how to proceed here but It seems like we would need > some refactoring in plpython to make it work with funccache. plpython will certainly need some work, but I'm entirely amenable to also changing funccache if it doesn't support this requirement well. That module is new as of v18, so it doesn't have much claim to have a stabilized API yet. > I've also tried to fix this without funccache, but it seems like we > would end up implementing something similar anyway. Yeah, that was my suspicion as well. funccache.c exists because I realized that SQL-language functions (executor/functions.c) were going to need logic that plpgsql had had for years. Actually ... if memory serves, SQL-language functions use ValuePerCall mode, so there probably already is a solution to this embedded in functions.c. Did you look at that? regards, tom lane