public inbox for [email protected]  
help / color / mirror / Atom feed
BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
9+ messages / 3 participants
[nested] [flat]

* BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
@ 2026-05-15 11:11 PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: PG Bug reporting form @ 2026-05-15 11:11 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following bug has been logged on the website:

Bug reference:      19480
Logged by:          Andrzej Doros
Email address:      [email protected]
PostgreSQL version: 17.9
Operating system:   Ubuntu 22.04.5 LTS (x86_64), kernel 5.15, glibc 2.
Description:        

PostgreSQL version: 17.9 (production crash), confirmed identical on 17.10
OS: Ubuntu 22.04.5 LTS, x86_64, kernel 5.15, glibc 2.35
Package: postgresql-plpython3-17 from pgdg apt repository


DESCRIPTION
-----------

A PL/Python set-returning function (SRF) crashes the backend with SIGSEGV
when
another session executes CREATE OR REPLACE FUNCTION (or ALTER FUNCTION) on
the
same function while the SRF is mid-iteration.

This is a use-after-free. srfstate->savedargs is allocated inside proc->mcxt
by
PLy_function_save_args() (plpy_exec.c:503). On each per-call SRF invocation,
plpython3_call_handler calls PLy_procedure_get(), which may call
PLy_procedure_delete(old_proc) -> MemoryContextDelete(old_proc->mcxt) if the
function's pg_proc row has changed (different xmin or ctid). After that,
srfstate->savedargs is a dangling pointer — it is not cleared. The next
PLy_function_restore_args() reads freed memory:

    if (srfstate->savedargs)                  /* non-NULL dangling pointer
*/
        PLy_function_restore_args(proc, srfstate->savedargs);  /* reads
freed mem */

Inside PLy_function_restore_args (plpy_exec.c:551):

    for (i = 0; i < savedargs->nargs; i++)   /* nargs from freed memory */
    {
        if (proc->argnames[i] && ...)
            PyDict_SetItemString(..., proc->argnames[i], ...);

When savedargs->nargs is garbage (e.g. 2056017128 in two production core
dumps),
proc->argnames[i] for large i reads an invalid pointer, which is passed to
PyDict_SetItemString -> PyUnicode_FromString -> strlen -> SIGSEGV.


CRASH STACK (two identical core dumps from production, PG 17.9, Ubuntu
22.04)
------------------------------------------------------------------------------

#0  __strlen_evex()
#1  PyUnicode_FromString(u=0x69ffff0000)
#2  PyDict_SetItemString(...)
#3  PLy_function_restore_args(proc=..., savedargs=...)
#4  PLy_exec_function(...)
#5  plpython3_call_handler(...)
#6  fmgr_security_definer(...)
#7  ExecMakeTableFunctionResult(...)

State from the newer core dump:

  proc->proname    = "tags_report_plpython"
  proc->nargs      = 1
  proc->argnames[0]= "flavour"
  savedargs->nargs = 2056017128   <- should be 1; contains garbage
  savedargs->namedargs[0] = 'tags' <- still valid (not yet overwritten)
  i = 4                            <- loop has iterated far past argnames[]


TRIGGER CONDITION
-----------------

The pg_proc invalidation reaches Session A's backend when
AcceptInvalidationMessages() is called. This happens when Session A's Python
code calls plpy.execute() with a statement that acquires a NEW relation lock
(e.g. CREATE TEMP TABLE, any table not previously locked in this statement).
Simply calling plpy.execute("SELECT 1") is not sufficient because the lock
on
pg_proc is already held and subsequent requests are served from the
per-process
lock table without invoking AcceptInvalidationMessages.

In production the trigger is autovacuum on pg_proc (which moves the tuple's
ctid) or any concurrent DDL from another session. Long-running SRFs (hours)
are much more likely to hit this window.


STEPS TO REPRODUCE
------------------

Requires two concurrent sessions and PostgreSQL with plpython3u.

Session A — start and leave running:

    CREATE EXTENSION IF NOT EXISTS plpython3u;

    CREATE OR REPLACE FUNCTION repro_srf(flavour VARCHAR)
    RETURNS TABLE (i BIGINT) AS $$
    import time
    for i in range(100):
        -- CREATE TEMP TABLE acquires a new relation lock each iteration,
        -- which causes AcceptInvalidationMessages to be called.
        plpy.execute(f"CREATE TEMP TABLE _rt_{i} (x int)")
        plpy.execute(f"DROP TABLE _rt_{i}")
        time.sleep(0.3)
        yield i
    $$ LANGUAGE plpython3u VOLATILE;

    SELECT count(*) FROM repro_srf('test');

Session B — while Session A is running (after ~2 seconds):

    CREATE OR REPLACE FUNCTION repro_srf(flavour VARCHAR)
    RETURNS TABLE (i BIGINT) AS $$
    import time
    for i in range(100):
        plpy.execute(f"CREATE TEMP TABLE _rt_{i} (x int)")
        plpy.execute(f"DROP TABLE _rt_{i}")
        time.sleep(0.3)
        yield i
    $$ LANGUAGE plpython3u VOLATILE;

NOTE: In a minimal test without memory pressure, the freed savedargs memory
is often not overwritten quickly enough to produce a crash —
savedargs->nargs
accidentally retains its correct value of 1 and restore_args succeeds. Under
production load (long-running SRF, many Python allocations), the freed
region
is overwritten and the crash occurs.

The crash can be triggered deterministically with gdb by setting
savedargs->nargs to a large value immediately after PLy_procedure_delete
fires
(see gdb script below). This produces the identical crash stack seen in
production.


GDB CONFIRMATION (PostgreSQL 17.10)
-------------------------------------

The following gdb session was used to confirm the exact sequence:

  (gdb) b PLy_procedure_delete
  (gdb) commands 1
  > printf "DELETE proname=%s mcxt=%p\n", proc->proname, proc->mcxt
  > set $corrupt_next = 1
  > c
  > end
  (gdb) b PLy_function_restore_args
  (gdb) commands 2
  > if $corrupt_next
  >   set {int}((long)savedargs + 24) = 2056017128
  >   set $corrupt_next = 0
  > end
  > c
  > end

Output:

  DELETE proname=repro_srf mcxt=0x5686641e1b20
  [PLy_function_restore_args fires with savedargs=0x5686641e28e8]
  [nargs set to 2056017128]
  Program received signal SIGSEGV, Segmentation fault.
  __strlen_avx2 ()

PostgreSQL log:
  server process (PID 366) was terminated by signal 11: Segmentation fault
  all server processes terminated; reinitializing


AFFECTED CODE
-------------

src/pl/plpython/plpy_exec.c, lines 503-506:
  PLy_function_save_args allocates savedargs in proc->mcxt

src/pl/plpython/plpy_exec.c, lines 117-119:
  PLy_function_restore_args is called with potentially dangling savedargs
  (no check whether proc was rebuilt since savedargs was created)

src/pl/plpython/plpy_procedure.c, line 405 (PLy_procedure_delete):
  MemoryContextDelete(proc->mcxt) frees savedargs without nulling
  srfstate->savedargs


PROPOSED FIX
------------

The root cause is that srfstate->savedargs is tied to proc->mcxt (which can
be deleted at any per-call boundary) rather than to
funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime).

Option A — allocate savedargs in funcctx->multi_call_memory_ctx:
Change PLy_function_save_args to accept a MemoryContext parameter and pass
funcctx->multi_call_memory_ctx from PLy_exec_function. The saved PyObject*
references are valid regardless of which MemoryContext holds the struct.

Option B — detect proc rebuild and discard stale savedargs:
After PLy_procedure_get returns a new proc, check whether it differs from
the
proc that created srfstate->savedargs. If so, discard savedargs
(PLy_function_drop_args or simply set to NULL) and skip the restore.








^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
@ 2026-05-25 22:26 ` Matheus Alcantara <[email protected]>
  2026-05-28 14:10   ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-05-28 15:12   ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  0 siblings, 2 replies; 9+ messages in thread

From: Matheus Alcantara @ 2026-05-25 22:26 UTC (permalink / raw)
  To: [email protected]; [email protected]

On Fri May 15, 2026 at 8:11 AM -03, PG Bug reporting form wrote:
> The root cause is that srfstate->savedargs is tied to proc->mcxt (which can
> be deleted at any per-call boundary) rather than to
> funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime).
>
> Option A — allocate savedargs in funcctx->multi_call_memory_ctx:
> Change PLy_function_save_args to accept a MemoryContext parameter and pass
> funcctx->multi_call_memory_ctx from PLy_exec_function. The saved PyObject*
> references are valid regardless of which MemoryContext holds the struct.
>
> Option B — detect proc rebuild and discard stale savedargs:
> After PLy_procedure_get returns a new proc, check whether it differs from
> the
> proc that created srfstate->savedargs. If so, discard savedargs
> (PLy_function_drop_args or simply set to NULL) and skip the restore.
>

Hi, thank you for the very detailed bug report. I've managed to
reproduce the issue on master.

Option A seems to fix the issue (see attached patch) but I've found
another issue while playing with this that I think it's related:

CREATE OR REPLACE FUNCTION trigger_stack_overflow(x BIGINT)
RETURNS TABLE(i BIGINT) AS $$
    import time
    plpy.execute(f"CREATE TEMP TABLE _rt_{x} (x int)")
    plpy.execute(f"DROP TABLE _rt_{x}")
    time.sleep(0.3)
    plpy.execute("SELECT trigger_stack_overflow(1)")
    yield x
$$ LANGUAGE plpython3u VOLATILE;

Run SELECT trigger_stack_overflow(1) and on another session execute the
CREATE OR REPLACE and wait for the first session to crash with this
stacktrace:
frame #3: 0x000000010554a694 postgres`ExceptionalCondition(conditionName="proc->calldepth > 0", fileName="../src/pl/plpython/plpy_exec.c", lineNumber=701) at assert.c:65:2
frame #4: 0x0000000105e41984 plpython3.dylib`PLy_global_args_pop(proc=0x000000014b03cf00) at plpy_exec.c:701:2
frame #5: 0x0000000105e40d94 plpython3.dylib`PLy_exec_function(fcinfo=0x000000011e077738, proc=0x000000014b03cf00) at plpy_exec.c:264:3

The expected output from the first session should be something like
this:

ERROR:  54001: error fetching next item from iterator
DETAIL:  spiexceptions.StatementTooComplex: error fetching next item from iterator
HINT:  Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.

This is because when PLy_procedure_delete() is executed on
PLy_procedure_get() it also destroy information related with recursive
functions, such as "calldepth", "argstack" and "globals" which cause the
assert failure Assert(proc->calldepth > 0) on PLy_global_args_pop() when
it's executed on PG_CATCH block on PLy_exec_function() or EXC_BAD_ACCESS
when accessing "argstack" or "globals".

Althrought changing the memory context where savedargs is allocated fix
the reported issue I think that the long term fix is to preserve such
necessary execution information during PLyProcedure re-creation. I'm
still studying the code to see if and how this can implemented.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

From 61f46abd4509cc519de3e43adfd9e0b4fa0f6fcb Mon Sep 17 00:00:00 2001
From: Matheus Alcantara <[email protected]>
Date: Mon, 25 May 2026 19:22:09 -0300
Subject: [PATCH] plpython: Use correct memory context for savedargs

---
 src/pl/plpython/plpy_exec.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index de0dad1f533..d93e800e0be 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -31,7 +31,7 @@ typedef struct PLySRFState
 } PLySRFState;
 
 static PyObject *PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc);
-static PLySavedArgs *PLy_function_save_args(PLyProcedure *proc);
+static PLySavedArgs *PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc);
 static void PLy_function_restore_args(PLyProcedure *proc, PLySavedArgs *savedargs);
 static void PLy_function_drop_args(PLySavedArgs *savedargs);
 static void PLy_global_args_push(PLyProcedure *proc);
@@ -176,8 +176,15 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 				 * This won't be last call, so save argument values.  We do
 				 * this again each time in case the iterator is changing those
 				 * values.
+				 *
+				 * We use funcctx->multi_call_memory_ctx to ensure savedargs
+				 * survives across ValuePerCall invocations, but is cleaned up
+				 * when the SRF completes.  This also protects against the
+				 * case where the procedure is delated (via
+				 * PLy_procedure_delete ) while the SRF is running.
 				 */
-				srfstate->savedargs = PLy_function_save_args(proc);
+				srfstate->savedargs = PLy_function_save_args(funcctx->multi_call_memory_ctx,
+															 proc);
 			}
 		}
 
@@ -536,13 +543,13 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
  * available via the proc's globals :-( ... but we're stuck with that now.
  */
 static PLySavedArgs *
-PLy_function_save_args(PLyProcedure *proc)
+PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc)
 {
 	PLySavedArgs *result;
 
-	/* saved args are always allocated in procedure's context */
+	/* Allocate in the caller-specified memory context */
 	result = (PLySavedArgs *)
-		MemoryContextAllocZero(proc->mcxt,
+		MemoryContextAllocZero(mctx,
 							   offsetof(PLySavedArgs, namedargs) +
 							   proc->nargs * sizeof(PyObject *));
 	result->nargs = proc->nargs;
@@ -658,8 +665,14 @@ PLy_global_args_push(PLyProcedure *proc)
 	{
 		PLySavedArgs *node;
 
-		/* Build a struct containing current argument values */
-		node = PLy_function_save_args(proc);
+		/*
+		 * Build a struct containing current argument values.  We use
+		 * proc->mcxt because the saved args must persist across the entire
+		 * recursive call stack, which can span multiple function invocations.
+		 * The procedure's memory context has the appropriate lifetime for
+		 * this, and we explicitly free the struct when popping.
+		 */
+		node = PLy_function_save_args(proc->mcxt, proc);
 
 		/*
 		 * Push the saved argument values into the procedure's stack.  Once we
-- 
2.50.1 (Apple Git-155)



Attachments:

  [text/plain] 0001-plpython-Use-correct-memory-context-for-savedargs.patch (3.1K, 2-0001-plpython-Use-correct-memory-context-for-savedargs.patch)
  download | inline diff:
From 61f46abd4509cc519de3e43adfd9e0b4fa0f6fcb Mon Sep 17 00:00:00 2001
From: Matheus Alcantara <[email protected]>
Date: Mon, 25 May 2026 19:22:09 -0300
Subject: [PATCH] plpython: Use correct memory context for savedargs

---
 src/pl/plpython/plpy_exec.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index de0dad1f533..d93e800e0be 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -31,7 +31,7 @@ typedef struct PLySRFState
 } PLySRFState;
 
 static PyObject *PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc);
-static PLySavedArgs *PLy_function_save_args(PLyProcedure *proc);
+static PLySavedArgs *PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc);
 static void PLy_function_restore_args(PLyProcedure *proc, PLySavedArgs *savedargs);
 static void PLy_function_drop_args(PLySavedArgs *savedargs);
 static void PLy_global_args_push(PLyProcedure *proc);
@@ -176,8 +176,15 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 				 * This won't be last call, so save argument values.  We do
 				 * this again each time in case the iterator is changing those
 				 * values.
+				 *
+				 * We use funcctx->multi_call_memory_ctx to ensure savedargs
+				 * survives across ValuePerCall invocations, but is cleaned up
+				 * when the SRF completes.  This also protects against the
+				 * case where the procedure is delated (via
+				 * PLy_procedure_delete ) while the SRF is running.
 				 */
-				srfstate->savedargs = PLy_function_save_args(proc);
+				srfstate->savedargs = PLy_function_save_args(funcctx->multi_call_memory_ctx,
+															 proc);
 			}
 		}
 
@@ -536,13 +543,13 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
  * available via the proc's globals :-( ... but we're stuck with that now.
  */
 static PLySavedArgs *
-PLy_function_save_args(PLyProcedure *proc)
+PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc)
 {
 	PLySavedArgs *result;
 
-	/* saved args are always allocated in procedure's context */
+	/* Allocate in the caller-specified memory context */
 	result = (PLySavedArgs *)
-		MemoryContextAllocZero(proc->mcxt,
+		MemoryContextAllocZero(mctx,
 							   offsetof(PLySavedArgs, namedargs) +
 							   proc->nargs * sizeof(PyObject *));
 	result->nargs = proc->nargs;
@@ -658,8 +665,14 @@ PLy_global_args_push(PLyProcedure *proc)
 	{
 		PLySavedArgs *node;
 
-		/* Build a struct containing current argument values */
-		node = PLy_function_save_args(proc);
+		/*
+		 * Build a struct containing current argument values.  We use
+		 * proc->mcxt because the saved args must persist across the entire
+		 * recursive call stack, which can span multiple function invocations.
+		 * The procedure's memory context has the appropriate lifetime for
+		 * this, and we explicitly free the struct when popping.
+		 */
+		node = PLy_function_save_args(proc->mcxt, proc);
 
 		/*
 		 * Push the saved argument values into the procedure's stack.  Once we
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
@ 2026-05-28 14:10   ` Matheus Alcantara <[email protected]>
  1 sibling, 0 replies; 9+ messages in thread

From: Matheus Alcantara @ 2026-05-28 14:10 UTC (permalink / raw)
  To: [email protected]; [email protected]

On 25/05/26 19:26, Matheus Alcantara wrote:
> On Fri May 15, 2026 at 8:11 AM -03, PG Bug reporting form wrote:
>> The root cause is that srfstate->savedargs is tied to proc->mcxt (which can
>> be deleted at any per-call boundary) rather than to
>> funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime).
>>
>> Option A — allocate savedargs in funcctx->multi_call_memory_ctx:
>> Change PLy_function_save_args to accept a MemoryContext parameter and pass
>> funcctx->multi_call_memory_ctx from PLy_exec_function. The saved PyObject*
>> references are valid regardless of which MemoryContext holds the struct.
>>
>> Option B — detect proc rebuild and discard stale savedargs:
>> After PLy_procedure_get returns a new proc, check whether it differs from
>> the
>> proc that created srfstate->savedargs. If so, discard savedargs
>> (PLy_function_drop_args or simply set to NULL) and skip the restore.
>>
> 
> Hi, thank you for the very detailed bug report. I've managed to
> reproduce the issue on master.
> 
> Option A seems to fix the issue (see attached patch) but I've found
> another issue while playing with this that I think it's related:
> 
> ...
> 
> This is because when PLy_procedure_delete() is executed on
> PLy_procedure_get() it also destroy information related with recursive
> functions, such as "calldepth", "argstack" and "globals" which cause the
> assert failure Assert(proc->calldepth > 0) on PLy_global_args_pop() when
> it's executed on PG_CATCH block on PLy_exec_function() or EXC_BAD_ACCESS
> when accessing "argstack" or "globals".
> 
> Although changing the memory context where savedargs is allocated fix
> the reported issue I think that the long term fix is to preserve such
> necessary execution information during PLyProcedure re-creation. I'm
> still studying the code to see if and how this can implemented.
> 

This is being tricky to debug. I'm not being able to reproduce the 
issue with assert disabled, not even with the steps shared on the bug 
report.

Andrzej could you please confirm if you hit this failure with assert 
enable? And if it's enable, could you please check if it's also 
happens with assert disabled?

Also, the 17.10 version was released some weeks ago, can you also test 
against this new minor release?

--
Matheus Alcantara
EDB: https://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
@ 2026-05-28 15:12   ` Tom Lane <[email protected]>
  2026-06-01 22:14     ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  1 sibling, 1 reply; 9+ messages in thread

From: Tom Lane @ 2026-05-28 15:12 UTC (permalink / raw)
  To: Matheus Alcantara <[email protected]>; +Cc: [email protected]; [email protected]

"Matheus Alcantara" <[email protected]> writes:
> On Fri May 15, 2026 at 8:11 AM -03, PG Bug reporting form wrote:
>> The root cause is that srfstate->savedargs is tied to proc->mcxt (which can
>> be deleted at any per-call boundary) rather than to
>> funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime).

> Option A seems to fix the issue (see attached patch) but I've found
> another issue while playing with this that I think it's related:
> ...
> This is because when PLy_procedure_delete() is executed on
> PLy_procedure_get() it also destroy information related with recursive
> functions, such as "calldepth", "argstack" and "globals" which cause the
> assert failure Assert(proc->calldepth > 0) on PLy_global_args_pop() when
> it's executed on PG_CATCH block on PLy_exec_function() or EXC_BAD_ACCESS
> when accessing "argstack" or "globals".

Yeah.  The bigger picture though is: if we are re-entrantly calling
either a recursive function or a SRF, we should not destroy any of the
existing state, nor do we want to replace the function body.  The only
way to have sane behavior is to keep executing the same function body
until the execution instance (recursion level or continued SRF) is
done.  So these concerns about associated state are only part of the
problem.

plpgsql ran into this years ago, and its solution has been to maintain
a reference count on each function parsetree and not destroy an
obsoleted parsetree till the reference count goes to zero.  I've had
in the back of my head that the other PLs need to do likewise, but it
hasn't gotten to the front of the to-do list, mainly because the other
PLs are much less used and so field complaints about this have been
rare.  I had hoped also that the language interpreters underlying the
other PLs might solve some of this for us, but it's unclear to what
extent they help.  Certainly it's not cool to be clobbering our own
execution state that's outside the language interpreter.

We might want to go as far as converting the other PLs to use the
utils/cache/funccache.c infrastructure, but perhaps there is a
less invasive fix.  Certainly, a fix based on funccache.c could not
be back-patched.  (On the other hand, given the rarity of complaints,
perhaps a HEAD-only fix is acceptable.)

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-05-28 15:12   ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
@ 2026-06-01 22:14     ` Matheus Alcantara <[email protected]>
  2026-06-01 23:26       ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Matheus Alcantara @ 2026-06-01 22:14 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]; [email protected]

On Thu May 28, 2026 at 12:12 PM -03, Tom Lane wrote:
> "Matheus Alcantara" <[email protected]> writes:
>> On Fri May 15, 2026 at 8:11 AM -03, PG Bug reporting form wrote:
>>> The root cause is that srfstate->savedargs is tied to proc->mcxt (which can
>>> be deleted at any per-call boundary) rather than to
>>> funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime).
>
>> Option A seems to fix the issue (see attached patch) but I've found
>> another issue while playing with this that I think it's related:
>> ...
>> This is because when PLy_procedure_delete() is executed on
>> PLy_procedure_get() it also destroy information related with recursive
>> functions, such as "calldepth", "argstack" and "globals" which cause the
>> assert failure Assert(proc->calldepth > 0) on PLy_global_args_pop() when
>> it's executed on PG_CATCH block on PLy_exec_function() or EXC_BAD_ACCESS
>> when accessing "argstack" or "globals".
>
> Yeah.  The bigger picture though is: if we are re-entrantly calling
> either a recursive function or a SRF, we should not destroy any of the
> existing state, nor do we want to replace the function body.  The only
> way to have sane behavior is to keep executing the same function body
> until the execution instance (recursion level or continued SRF) is
> done.  So these concerns about associated state are only part of the
> problem.
>
> plpgsql ran into this years ago, and its solution has been to maintain
> a reference count on each function parsetree and not destroy an
> obsoleted parsetree till the reference count goes to zero.  I've had
> in the back of my head that the other PLs need to do likewise, but it
> hasn't gotten to the front of the to-do list, mainly because the other
> PLs are much less used and so field complaints about this have been
> rare.  I had hoped also that the language interpreters underlying the
> other PLs might solve some of this for us, but it's unclear to what
> extent they help.  Certainly it's not cool to be clobbering our own
> execution state that's outside the language interpreter.
>
> We might want to go as far as converting the other PLs to use the
> utils/cache/funccache.c infrastructure, but perhaps there is a
> less invasive fix.  Certainly, a fix based on funccache.c could not
> be back-patched.  (On the other hand, given the rarity of complaints,
> perhaps a HEAD-only fix is acceptable.)
>

I've been exploring the funccache.c approach for plpython. The main
challenge is that plpython uses SFRM_ValuePerCall for SRFs, whereas
plpgsql uses SFRM_Materialize. This means plpgsql can simply increment
use_count at the start of plpgsql_call_handler() and decrement it at the
end, since all results are produced in a single call. For plpython,
ExecMakeTableFunctionResult() calls the handler multiple times, with
use_count returning to zero between calls.

With ValuePerCall, cached_function_compile() may try to re-create an
invalid cache entry because use_count can be 0 while
ExecMakeTableFunctionResult() is in the middle of its loop. In that
case, the SRFState would be lost for the currently running plpython
function.

I'm still not sure how to proceed here but It seems like we would need
some refactoring in plpython to make it work with funccache. Not sure if
changing ValuePerCall to Materialize is a way to go or perhaps there's
another way to fix this.

I've also tried to fix this without funccache, but it seems like we
would end up implementing something similar anyway. That might be a way
to go, but I'm also not sure if it's the best path.

Thoughts?

--
Matheus Alcantara
EDB: https://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-05-28 15:12   ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  2026-06-01 22:14     ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
@ 2026-06-01 23:26       ` Tom Lane <[email protected]>
  2026-06-05 18:09         ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Tom Lane @ 2026-06-01 23:26 UTC (permalink / raw)
  To: Matheus Alcantara <[email protected]>; +Cc: [email protected]; [email protected]

"Matheus Alcantara" <[email protected]> writes:
> On Thu May 28, 2026 at 12:12 PM -03, Tom Lane wrote:
>> Yeah.  The bigger picture though is: if we are re-entrantly calling
>> either a recursive function or a SRF, we should not destroy any of the
>> existing state, nor do we want to replace the function body.  The only
>> way to have sane behavior is to keep executing the same function body
>> until the execution instance (recursion level or continued SRF) is
>> done.  So these concerns about associated state are only part of the
>> problem.

> I've been exploring the funccache.c approach for plpython. The main
> challenge is that plpython uses SFRM_ValuePerCall for SRFs, whereas
> plpgsql uses SFRM_Materialize. This means plpgsql can simply increment
> use_count at the start of plpgsql_call_handler() and decrement it at the
> end, since all results are produced in a single call. For plpython,
> ExecMakeTableFunctionResult() calls the handler multiple times, with
> use_count returning to zero between calls.

Right.  I think what we have to do is maintain the increased use_count
across the whole series of SRF executions and decrement it only once
we're done.  That implies that we need some out-of-band mechanism for
decrementing the use_count if the query fails to run the SRF to
completion for whatever reason (error, LIMIT, etc).  The first tool
I would reach for is a context reset callback attached to the query's
executor context, but there may be a better answer.  Whether we do it
like that or some other way, it might be appropriate to put
infrastructure for it into funccache.c instead of expecting every PL
that wants to use SFRM_ValuePerCall to re-invent this wheel.

> I'm still not sure how to proceed here but It seems like we would need
> some refactoring in plpython to make it work with funccache.

plpython will certainly need some work, but I'm entirely amenable to
also changing funccache if it doesn't support this requirement well.
That module is new as of v18, so it doesn't have much claim to have
a stabilized API yet.

> I've also tried to fix this without funccache, but it seems like we
> would end up implementing something similar anyway.

Yeah, that was my suspicion as well.  funccache.c exists because
I realized that SQL-language functions (executor/functions.c) were
going to need logic that plpgsql had had for years.

Actually ... if memory serves, SQL-language functions use ValuePerCall
mode, so there probably already is a solution to this embedded in
functions.c.  Did you look at that?

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-05-28 15:12   ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  2026-06-01 22:14     ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-06-01 23:26       ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
@ 2026-06-05 18:09         ` Matheus Alcantara <[email protected]>
  2026-06-05 19:11           ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Matheus Alcantara @ 2026-06-05 18:09 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]; [email protected]

On Mon Jun 1, 2026 at 8:26 PM -03, Tom Lane wrote:
> Yeah, that was my suspicion as well.  funccache.c exists because
> I realized that SQL-language functions (executor/functions.c) were
> going to need logic that plpgsql had had for years.
>
> Actually ... if memory serves, SQL-language functions use ValuePerCall
> mode, so there probably already is a solution to this embedded in
> functions.c.  Did you look at that?
>

I dind't look at this before but this was exactly the right call. SQL
functions handle this by maintaining a per-call-site cache struct
(SQLFunctionCache) in fn_extra that holds both the pointer to the
long-lived hash entry and the execution state. The use_count is
incremented when we first obtain the function and decremented via a
MemoryContextCallback when fn_mcxt is deleted.

I've adapted the same approach for PL/Python. The main changes are:

PLyProcedure now embeds CachedFunction as its first member and is
managed by cached_function_compile(). A new PLyProcedureCache struct
lives in fn_extra and holds the pointer to PLyProcedure plus SRF state.
For cleanup, I use a MemoryContextCallback on fn_mcxt to decrement
use_count, and an ExprContextCallback to clean up Python iterator state
when the SRF is interrupted.

Since fn_extra is now used for PLyProcedureCache, I had to remove the
SRF macros and switch to direct isDone signaling via ReturnSetInfo,
which is how SQL functions do it anyway.

I also fixed the validator to create a fake fcinfo with the correct
fn_oid (the function being validated), matching what PL/pgSQL does.

Patch attached.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

From 622df933f9badc68c39f7b88376427fbbbd2b099 Mon Sep 17 00:00:00 2001
From: Matheus Alcantara <[email protected]>
Date: Fri, 5 Jun 2026 10:51:53 -0300
Subject: [PATCH v1] plpython: Use funccache.c infrastructure for procedure
 caching

PL/Python set-returning functions can crash with a use-after-free when
CREATE OR REPLACE FUNCTION is executed while the SRF is mid-iteration.
The crash occurs because srfstate->savedargs is allocated in proc->mcxt,
which gets deleted when the procedure is invalidated, leaving a dangling
pointer that PLy_function_restore_args() then dereferences.

The fix is to use reference counting to prevent destroying the function
state while it's still in use, similar to what PL/pgSQL has done. This
commit converts PL/Python to use the funccache.c infrastructure
introduced in v18.

The main challenge is that PL/Python uses SFRM_ValuePerCall for SRFs,
where the handler is called multiple times with use_count potentially
returning to zero between calls. SQL functions face the same challenge,
so this commit follows the same approach used in functions.c: maintain
a per-call-site cache struct (PLyProcedureCache) in fn_extra that holds
both the pointer to the long-lived PLyProcedure and the SRF execution
state. The use_count is incremented when we first obtain the procedure
and decremented via a MemoryContextCallback when fn_mcxt is deleted.
For SRFs, we register an ExprContextCallback to clean up iterator state
when the expression context is shut down.

Since fn_extra is now used for PLyProcedureCache, this commit removes
the SRF macros (SRF_IS_FIRSTCALL, SRF_RETURN_NEXT, etc.) and switches to
direct isDone signaling via ReturnSetInfo, matching how SQL functions
handle ValuePerCall mode.

Author: Matheus Alcantara <[email protected]>
Reported-by: Andrzej Doros <[email protected]>
Suggested-by: Tom Lane <[email protected]>
Discussion: https://www.postgresql.org/message-id/19480-f1f9fdce30462fc4%40postgresql.org
---
 src/pl/plpython/plpy_exec.c      | 160 +++++++++++---------
 src/pl/plpython/plpy_exec.h      |   2 +-
 src/pl/plpython/plpy_main.c      |  88 ++++++-----
 src/pl/plpython/plpy_procedure.c | 248 +++++++++++++++++--------------
 src/pl/plpython/plpy_procedure.h |  51 ++++---
 src/tools/pgindent/typedefs.list |   1 +
 6 files changed, 305 insertions(+), 245 deletions(-)

diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index de0dad1f533..5cbcb031fb3 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -22,22 +22,14 @@
 #include "utils/fmgrprotos.h"
 #include "utils/rel.h"
 
-/* saved state for a set-returning function */
-typedef struct PLySRFState
-{
-	PyObject   *iter;			/* Python iterator producing results */
-	PLySavedArgs *savedargs;	/* function argument values */
-	MemoryContextCallback callback; /* for releasing refcounts when done */
-} PLySRFState;
-
 static PyObject *PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc);
-static PLySavedArgs *PLy_function_save_args(PLyProcedure *proc);
+static PLySavedArgs *PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc);
 static void PLy_function_restore_args(PLyProcedure *proc, PLySavedArgs *savedargs);
 static void PLy_function_drop_args(PLySavedArgs *savedargs);
 static void PLy_global_args_push(PLyProcedure *proc);
 static void PLy_global_args_pop(PLyProcedure *proc);
-static void plpython_srf_cleanup_callback(void *arg);
 static void plpython_return_error_callback(void *arg);
+static void ShutdownPLyFunction(Datum arg);
 
 static PyObject *PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc,
 										HeapTuple *rv);
@@ -51,14 +43,15 @@ static void PLy_abort_open_subtransactions(int save_subxact_level);
 
 /* function subhandler */
 Datum
-PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
+PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedureCache *pcache)
 {
+	PLyProcedure *proc = pcache->proc;
 	bool		is_setof = proc->is_setof;
 	Datum		rv;
 	PyObject   *volatile plargs = NULL;
 	PyObject   *volatile plrv = NULL;
-	FuncCallContext *volatile funcctx = NULL;
 	PLySRFState *volatile srfstate = NULL;
+	ReturnSetInfo *rsi = NULL;
 	ErrorContextCallback plerrcontext;
 
 	/*
@@ -72,25 +65,32 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 	{
 		if (is_setof)
 		{
-			/* First Call setup */
-			if (SRF_IS_FIRSTCALL())
+			rsi = (ReturnSetInfo *) fcinfo->resultinfo;
+
+			/* First call setup */
+			if (pcache->srfstate == NULL)
 			{
-				funcctx = SRF_FIRSTCALL_INIT();
-				srfstate = (PLySRFState *)
-					MemoryContextAllocZero(funcctx->multi_call_memory_ctx,
-										   sizeof(PLySRFState));
-				/* Immediately register cleanup callback */
-				srfstate->callback.func = plpython_srf_cleanup_callback;
-				srfstate->callback.arg = srfstate;
-				MemoryContextRegisterResetCallback(funcctx->multi_call_memory_ctx,
-												   &srfstate->callback);
-				funcctx->user_fctx = srfstate;
+				if (!rsi || !IsA(rsi, ReturnSetInfo) ||
+					(rsi->allowedModes & SFRM_ValuePerCall) == 0)
+				{
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("unsupported set function return mode"),
+							 errdetail("PL/Python set-returning functions only support returning one value per call.")));
+				}
+				rsi->returnMode = SFRM_ValuePerCall;
+
+				pcache->srfstate = (PLySRFState *)
+					MemoryContextAllocZero(pcache->fcontext, sizeof(PLySRFState));
+
+				/* Register shutdown callback to clean up at end of expression */
+				RegisterExprContextCallback(rsi->econtext,
+											ShutdownPLyFunction,
+											PointerGetDatum(pcache));
+				pcache->shutdown_reg = true;
 			}
-			/* Every call setup */
-			funcctx = SRF_PERCALL_SETUP();
-			Assert(funcctx != NULL);
-			srfstate = (PLySRFState *) funcctx->user_fctx;
-			Assert(srfstate != NULL);
+
+			srfstate = pcache->srfstate;
 		}
 
 		if (srfstate == NULL || srfstate->iter == NULL)
@@ -127,20 +127,7 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 		{
 			if (srfstate->iter == NULL)
 			{
-				/* first time -- do checks and setup */
-				ReturnSetInfo *rsi = (ReturnSetInfo *) fcinfo->resultinfo;
-
-				if (!rsi || !IsA(rsi, ReturnSetInfo) ||
-					(rsi->allowedModes & SFRM_ValuePerCall) == 0)
-				{
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("unsupported set function return mode"),
-							 errdetail("PL/Python set-returning functions only support returning one value per call.")));
-				}
-				rsi->returnMode = SFRM_ValuePerCall;
-
-				/* Make iterator out of returned object */
+				/* first time -- make iterator out of returned object */
 				srfstate->iter = PyObject_GetIter(plrv);
 
 				Py_DECREF(plrv);
@@ -177,7 +164,7 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 				 * this again each time in case the iterator is changing those
 				 * values.
 				 */
-				srfstate->savedargs = PLy_function_save_args(proc);
+				srfstate->savedargs = PLy_function_save_args(pcache->fcontext, proc);
 			}
 		}
 
@@ -263,8 +250,8 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 		 * If there was an error within a SRF, the iterator might not have
 		 * been exhausted yet.  Clear it so the next invocation of the
 		 * function will start the iteration again.  (This code is probably
-		 * unnecessary now; plpython_srf_cleanup_callback should take care of
-		 * cleanup.  But it doesn't hurt anything to do it here.)
+		 * unnecessary now; ShutdownPLyFunction should take care of cleanup.
+		 * But it doesn't hurt anything to do it here.)
 		 */
 		if (srfstate)
 		{
@@ -290,22 +277,66 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 
 	if (srfstate)
 	{
-		/* We're in a SRF, exit appropriately */
+		/* We're in a SRF, signal via rsi->isDone */
 		if (srfstate->iter == NULL)
 		{
-			/* Iterator exhausted, so we're done */
-			SRF_RETURN_DONE(funcctx);
+			/*
+			 * Iterator exhausted.  Unregister the shutdown callback since
+			 * we're done normally, then clean up srfstate.
+			 */
+			if (pcache->shutdown_reg)
+			{
+				UnregisterExprContextCallback(rsi->econtext,
+											  ShutdownPLyFunction,
+											  PointerGetDatum(pcache));
+				pcache->shutdown_reg = false;
+			}
+			pfree(pcache->srfstate);
+			pcache->srfstate = NULL;
+
+			rsi->isDone = ExprEndResult;
+			fcinfo->isnull = true;
+			return (Datum) 0;
 		}
-		else if (fcinfo->isnull)
-			SRF_RETURN_NEXT_NULL(funcctx);
 		else
-			SRF_RETURN_NEXT(funcctx, rv);
+		{
+			rsi->isDone = ExprMultipleResult;
+			return rv;
+		}
 	}
 
 	/* Plain function, just return the Datum value (possibly null) */
 	return rv;
 }
 
+/*
+ * Callback function invoked when an expression context holding a SRF
+ * is shut down.  This cleans up any Python iterator state.
+ */
+static void
+ShutdownPLyFunction(Datum arg)
+{
+	PLyProcedureCache *pcache = (PLyProcedureCache *) DatumGetPointer(arg);
+	PLySRFState *srfstate = pcache->srfstate;
+
+	pcache->shutdown_reg = false;
+
+	if (srfstate != NULL)
+	{
+		/* Release the Python iterator if still active */
+		Py_XDECREF(srfstate->iter);
+		srfstate->iter = NULL;
+
+		/* Drop any saved args */
+		if (srfstate->savedargs)
+			PLy_function_drop_args(srfstate->savedargs);
+		srfstate->savedargs = NULL;
+
+		pfree(srfstate);
+		pcache->srfstate = NULL;
+	}
+}
+
 /*
  * trigger subhandler
  *
@@ -536,13 +567,13 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
  * available via the proc's globals :-( ... but we're stuck with that now.
  */
 static PLySavedArgs *
-PLy_function_save_args(PLyProcedure *proc)
+PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc)
 {
 	PLySavedArgs *result;
 
 	/* saved args are always allocated in procedure's context */
 	result = (PLySavedArgs *)
-		MemoryContextAllocZero(proc->mcxt,
+		MemoryContextAllocZero(mctx,
 							   offsetof(PLySavedArgs, namedargs) +
 							   proc->nargs * sizeof(PyObject *));
 	result->nargs = proc->nargs;
@@ -659,7 +690,7 @@ PLy_global_args_push(PLyProcedure *proc)
 		PLySavedArgs *node;
 
 		/* Build a struct containing current argument values */
-		node = PLy_function_save_args(proc);
+		node = PLy_function_save_args(proc->mcxt, proc);
 
 		/*
 		 * Push the saved argument values into the procedure's stack.  Once we
@@ -713,25 +744,6 @@ PLy_global_args_pop(PLyProcedure *proc)
 	}
 }
 
-/*
- * Memory context deletion callback for cleaning up a PLySRFState.
- * We need this in case execution of the SRF is terminated early,
- * due to error or the caller simply not running it to completion.
- */
-static void
-plpython_srf_cleanup_callback(void *arg)
-{
-	PLySRFState *srfstate = (PLySRFState *) arg;
-
-	/* Release refcount on the iter, if we still have one */
-	Py_XDECREF(srfstate->iter);
-	srfstate->iter = NULL;
-	/* And drop any saved args; we won't need them */
-	if (srfstate->savedargs)
-		PLy_function_drop_args(srfstate->savedargs);
-	srfstate->savedargs = NULL;
-}
-
 static void
 plpython_return_error_callback(void *arg)
 {
diff --git a/src/pl/plpython/plpy_exec.h b/src/pl/plpython/plpy_exec.h
index f35eabbd8ee..1ade1bae151 100644
--- a/src/pl/plpython/plpy_exec.h
+++ b/src/pl/plpython/plpy_exec.h
@@ -7,7 +7,7 @@
 
 #include "plpy_procedure.h"
 
-extern Datum PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc);
+extern Datum PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedureCache *pcache);
 extern HeapTuple PLy_exec_trigger(FunctionCallInfo fcinfo, PLyProcedure *proc);
 extern void PLy_exec_event_trigger(FunctionCallInfo fcinfo, PLyProcedure *proc);
 
diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c
index 9f07c115f80..2ed9abab15b 100644
--- a/src/pl/plpython/plpy_main.c
+++ b/src/pl/plpython/plpy_main.c
@@ -39,7 +39,6 @@ PG_FUNCTION_INFO_V1(plpython3_call_handler);
 PG_FUNCTION_INFO_V1(plpython3_inline_handler);
 
 
-static PLyTrigType PLy_procedure_is_trigger(Form_pg_proc procStruct);
 static void plpython_error_callback(void *arg);
 static void plpython_inline_error_callback(void *arg);
 
@@ -103,8 +102,6 @@ _PG_init(void)
 
 	Py_DECREF(main_mod);
 
-	init_procedure_caches();
-
 	explicit_subtransactions = NIL;
 
 	PLy_execution_contexts = NULL;
@@ -113,10 +110,15 @@ _PG_init(void)
 Datum
 plpython3_validator(PG_FUNCTION_ARGS)
 {
+	LOCAL_FCINFO(fake_fcinfo, 0);
 	Oid			funcoid = PG_GETARG_OID(0);
 	HeapTuple	tuple;
 	Form_pg_proc procStruct;
-	PLyTrigType is_trigger;
+	FmgrInfo	flinfo;
+	TriggerData trigdata;
+	EventTriggerData etrigdata;
+	bool		is_trigger = false;
+	bool		is_event_trigger = false;
 
 	if (!CheckFunctionValidatorAccess(fcinfo->flinfo->fn_oid, funcoid))
 		PG_RETURN_VOID();
@@ -130,12 +132,33 @@ plpython3_validator(PG_FUNCTION_ARGS)
 		elog(ERROR, "cache lookup failed for function %u", funcoid);
 	procStruct = (Form_pg_proc) GETSTRUCT(tuple);
 
-	is_trigger = PLy_procedure_is_trigger(procStruct);
+	if (procStruct->prorettype == TRIGGEROID)
+		is_trigger = true;
+	else if (procStruct->prorettype == EVENT_TRIGGEROID)
+		is_event_trigger = true;
 
 	ReleaseSysCache(tuple);
 
-	/* We can't validate triggers against any particular table ... */
-	(void) PLy_procedure_get(funcoid, InvalidOid, is_trigger);
+	MemSet(fake_fcinfo, 0, SizeForFunctionCallInfo(0));
+	MemSet(&flinfo, 0, sizeof(flinfo));
+	fake_fcinfo->flinfo = &flinfo;
+	flinfo.fn_oid = funcoid;
+	flinfo.fn_mcxt = CurrentMemoryContext;
+
+	if (is_trigger)
+	{
+		MemSet(&trigdata, 0, sizeof(trigdata));
+		trigdata.type = T_TriggerData;
+		fake_fcinfo->context = (Node *) &trigdata;
+	}
+	else if (is_event_trigger)
+	{
+		MemSet(&etrigdata, 0, sizeof(etrigdata));
+		etrigdata.type = T_EventTriggerData;
+		fake_fcinfo->context = (Node *) &etrigdata;
+	}
+
+	(void) PLy_procedure_get(fake_fcinfo, true);
 
 	PG_RETURN_VOID();
 }
@@ -143,6 +166,7 @@ plpython3_validator(PG_FUNCTION_ARGS)
 Datum
 plpython3_call_handler(PG_FUNCTION_ARGS)
 {
+	PLyProcedureCache *proc;
 	bool		nonatomic;
 	Datum		retval;
 	PLyExecutionContext *exec_ctx;
@@ -162,11 +186,10 @@ plpython3_call_handler(PG_FUNCTION_ARGS)
 	 */
 	exec_ctx = PLy_push_execution_context(!nonatomic);
 
+	proc = PLy_procedure_get(fcinfo, false);
+
 	PG_TRY();
 	{
-		Oid			funcoid = fcinfo->flinfo->fn_oid;
-		PLyProcedure *proc;
-
 		/*
 		 * Setup error traceback support for ereport().  Note that the PG_TRY
 		 * structure pops this for us again at exit, so we needn't do that
@@ -180,32 +203,30 @@ plpython3_call_handler(PG_FUNCTION_ARGS)
 
 		if (CALLED_AS_TRIGGER(fcinfo))
 		{
-			Relation	tgrel = ((TriggerData *) fcinfo->context)->tg_relation;
 			HeapTuple	trv;
 
-			proc = PLy_procedure_get(funcoid, RelationGetRelid(tgrel), PLPY_TRIGGER);
-			exec_ctx->curr_proc = proc;
-			trv = PLy_exec_trigger(fcinfo, proc);
+			exec_ctx->curr_proc = proc->proc;
+			trv = PLy_exec_trigger(fcinfo, proc->proc);
 			retval = PointerGetDatum(trv);
 		}
 		else if (CALLED_AS_EVENT_TRIGGER(fcinfo))
 		{
-			proc = PLy_procedure_get(funcoid, InvalidOid, PLPY_EVENT_TRIGGER);
-			exec_ctx->curr_proc = proc;
-			PLy_exec_event_trigger(fcinfo, proc);
+			exec_ctx->curr_proc = proc->proc;
+			PLy_exec_event_trigger(fcinfo, proc->proc);
 			retval = (Datum) 0;
 		}
 		else
 		{
-			proc = PLy_procedure_get(funcoid, InvalidOid, PLPY_NOT_TRIGGER);
-			exec_ctx->curr_proc = proc;
+			exec_ctx->curr_proc = proc->proc;
 			retval = PLy_exec_function(fcinfo, proc);
 		}
 	}
 	PG_CATCH();
 	{
+		/* Destroy the execution context */
 		PLy_pop_execution_context();
 		PyErr_Clear();
+
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
@@ -223,6 +244,7 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 	InlineCodeBlock *codeblock = (InlineCodeBlock *) DatumGetPointer(PG_GETARG_DATUM(0));
 	FmgrInfo	flinfo;
 	PLyProcedure proc;
+	PLyProcedureCache pcache;
 	PLyExecutionContext *exec_ctx;
 	ErrorContextCallback plerrcontext;
 
@@ -248,6 +270,11 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 	 */
 	proc.result.typoid = VOIDOID;
 
+	/* Set up a minimal PLyProcedureCache for the inline block */
+	MemSet(&pcache, 0, sizeof(PLyProcedureCache));
+	pcache.proc = &proc;
+	pcache.fcontext = CurrentMemoryContext;
+
 	/*
 	 * Push execution context onto stack.  It is important that this get
 	 * popped again, so avoid putting anything that could throw error between
@@ -269,7 +296,7 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 
 		PLy_procedure_compile(&proc, codeblock->source_text);
 		exec_ctx->curr_proc = &proc;
-		PLy_exec_function(fake_fcinfo, &proc);
+		PLy_exec_function(fake_fcinfo, &pcache);
 	}
 	PG_CATCH();
 	{
@@ -289,27 +316,6 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
-static PLyTrigType
-PLy_procedure_is_trigger(Form_pg_proc procStruct)
-{
-	PLyTrigType ret;
-
-	switch (procStruct->prorettype)
-	{
-		case TRIGGEROID:
-			ret = PLPY_TRIGGER;
-			break;
-		case EVENT_TRIGGEROID:
-			ret = PLPY_EVENT_TRIGGER;
-			break;
-		default:
-			ret = PLPY_NOT_TRIGGER;
-			break;
-	}
-
-	return ret;
-}
-
 static void
 plpython_error_callback(void *arg)
 {
diff --git a/src/pl/plpython/plpy_procedure.c b/src/pl/plpython/plpy_procedure.c
index 750ba586e0c..02a23e170b3 100644
--- a/src/pl/plpython/plpy_procedure.c
+++ b/src/pl/plpython/plpy_procedure.c
@@ -9,33 +9,31 @@
 #include "access/htup_details.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/event_trigger.h"
+#include "commands/trigger.h"
 #include "funcapi.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
 #include "plpy_procedure.h"
 #include "plpy_util.h"
 #include "utils/builtins.h"
-#include "utils/hsearch.h"
+#include "utils/funccache.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/syscache.h"
 
-static HTAB *PLy_procedure_cache = NULL;
-
-static PLyProcedure *PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger);
-static bool PLy_procedure_valid(PLyProcedure *proc, HeapTuple procTup);
+static void PLy_procedure_create(PLyProcedure *proc,
+								 HeapTuple procTup,
+								 Oid fn_oid,
+								 PLyTrigType is_trigger);
 static char *PLy_procedure_munge_source(const char *name, const char *src);
-
-
-void
-init_procedure_caches(void)
-{
-	HASHCTL		hash_ctl;
-
-	hash_ctl.keysize = sizeof(PLyProcedureKey);
-	hash_ctl.entrysize = sizeof(PLyProcedureEntry);
-	PLy_procedure_cache = hash_create("PL/Python procedures", 32, &hash_ctl,
-									  HASH_ELEM | HASH_BLOBS);
-}
+static void PLy_compile_callback(FunctionCallInfo fcinfo,
+								 HeapTuple procTup,
+								 const CachedFunctionHashKey *hashkey,
+								 CachedFunction *cfunc,
+								 bool forValidator);
+static void PLy_delete_callback(CachedFunction *cfunc);
+static void RemovePLyProcedureCache(void *arg);
 
 /*
  * PLy_procedure_name: get the name of the specified procedure.
@@ -51,103 +49,98 @@ PLy_procedure_name(PLyProcedure *proc)
 }
 
 /*
- * PLy_procedure_get: returns a cached PLyProcedure, or creates, stores and
- * returns a new PLyProcedure.
+ * PLy_procedure_get: returns a cached PLyProcedureCache for the function.
  *
- * fn_oid is the OID of the function requested
- * fn_rel is InvalidOid or the relation this function triggers on
- * is_trigger denotes whether the function is a trigger function
+ * The PLyProcedureCache contains a pointer to the long-lived PLyProcedure
+ * (managed by funccache.c) and execution-specific state like SRF state.
  *
- * The reason that both fn_rel and is_trigger need to be passed is that when
- * trigger functions get validated we don't know which relation(s) they'll
- * be used with, so no sensible fn_rel can be passed.  Also, in that case
- * we can't make a cache entry because we can't construct the right cache key.
- * To forestall leakage of the PLyProcedure in such cases, delete it after
- * construction and return NULL.  That's okay because the only caller that
- * would pass that set of values is plpython3_validator, which ignores our
- * result anyway.
+ * For SRFs, if we are resuming execution (srfstate->iter != NULL), we skip
+ * revalidation and continue using the same PLyProcedure to ensure consistent
+ * behavior throughout the SRF execution.
  */
-PLyProcedure *
-PLy_procedure_get(Oid fn_oid, Oid fn_rel, PLyTrigType is_trigger)
+PLyProcedureCache *
+PLy_procedure_get(FunctionCallInfo fcinfo, bool forValidator)
 {
-	bool		use_cache;
-	HeapTuple	procTup;
-	PLyProcedureKey key;
-	PLyProcedureEntry *volatile entry = NULL;
-	PLyProcedure *volatile proc = NULL;
-	bool		found = false;
-
-	if (is_trigger == PLPY_TRIGGER && fn_rel == InvalidOid)
-		use_cache = false;
-	else
-		use_cache = true;
+	PLyProcedure *proc;
+	PLyProcedureCache *pcache;
+	FmgrInfo   *finfo = fcinfo->flinfo;
 
-	procTup = SearchSysCache1(PROCOID, ObjectIdGetDatum(fn_oid));
-	if (!HeapTupleIsValid(procTup))
-		elog(ERROR, "cache lookup failed for function %u", fn_oid);
+	/*
+	 * If this is the first execution for this FmgrInfo, set up a cache struct
+	 * (initially containing null pointers).  The cache must live as long as
+	 * the FmgrInfo, so it goes in fn_mcxt.  Also set up a memory context
+	 * callback that will be invoked when fn_mcxt is deleted.
+	 */
+	pcache = finfo->fn_extra;
+	if (pcache == NULL)
+	{
+		pcache = (PLyProcedureCache *)
+			MemoryContextAllocZero(finfo->fn_mcxt, sizeof(PLyProcedureCache));
+
+		pcache->fcontext = finfo->fn_mcxt;
+		pcache->mcb.func = RemovePLyProcedureCache;
+		pcache->mcb.arg = pcache;
+
+		MemoryContextRegisterResetCallback(finfo->fn_mcxt, &pcache->mcb);
+
+		finfo->fn_extra = pcache;
+	}
 
 	/*
-	 * Look for the function in the cache, unless we don't have the necessary
-	 * information (e.g. during validation). In that case we just don't cache
-	 * anything.
+	 * If we are resuming execution of a set-returning function, just keep
+	 * using the same cache.  We do not ask funccache.c to re-validate the
+	 * PLyProcedure: we want to run to completion using the function's initial
+	 * definition.
 	 */
-	if (use_cache)
+	if (pcache->srfstate != NULL && pcache->srfstate->iter != NULL)
 	{
-		key.fn_oid = fn_oid;
-		key.fn_rel = fn_rel;
-		entry = hash_search(PLy_procedure_cache, &key, HASH_ENTER, &found);
-		proc = entry->proc;
+		Assert(pcache->proc != NULL);
+		return pcache;
 	}
 
-	PG_TRY();
+	/*
+	 * Look up, or re-validate, the long-lived hash entry.
+	 */
+	proc = (PLyProcedure *)
+		cached_function_compile(fcinfo,
+								(CachedFunction *) pcache->proc,
+								PLy_compile_callback,
+								PLy_delete_callback,
+								sizeof(PLyProcedure),
+								true,
+								forValidator);
+
+	/*
+	 * Install the hash pointer in the PLyProcedureCache, and increment its
+	 * use count to reflect that.  If cached_function_compile gave us back a
+	 * different hash entry than we were using before, we must decrement that
+	 * one's use count.
+	 */
+	if (proc != pcache->proc)
 	{
-		if (!found)
+		if (pcache->proc != NULL)
 		{
-			/* Haven't found it, create a new procedure */
-			proc = PLy_procedure_create(procTup, fn_oid, is_trigger);
-			if (use_cache)
-				entry->proc = proc;
-			else
-			{
-				/* Delete the proc, otherwise it's a memory leak */
-				PLy_procedure_delete(proc);
-				proc = NULL;
-			}
-		}
-		else if (!PLy_procedure_valid(proc, procTup))
-		{
-			/* Found it, but it's invalid, free and reuse the cache entry */
-			entry->proc = NULL;
-			if (proc)
-				PLy_procedure_delete(proc);
-			proc = PLy_procedure_create(procTup, fn_oid, is_trigger);
-			entry->proc = proc;
+			Assert(pcache->proc->cfunc.use_count > 0);
+			pcache->proc->cfunc.use_count--;
 		}
-		/* Found it and it's valid, it's fine to use it */
-	}
-	PG_CATCH();
-	{
-		/* Do not leave an uninitialized entry in the cache */
-		if (use_cache)
-			hash_search(PLy_procedure_cache, &key, HASH_REMOVE, NULL);
-		PG_RE_THROW();
+		pcache->proc = proc;
+		proc->cfunc.use_count++;
 	}
-	PG_END_TRY();
-
-	ReleaseSysCache(procTup);
 
-	return proc;
+	return pcache;
 }
 
 /*
  * Create a new PLyProcedure structure
  */
-static PLyProcedure *
-PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
+static void
+PLy_procedure_create(PLyProcedure *proc,
+					 HeapTuple procTup,
+					 Oid fn_oid,
+					 PLyTrigType is_trigger)
 {
 	char		procName[NAMEDATALEN + 256];
 	Form_pg_proc procStruct;
-	PLyProcedure *volatile proc;
 	MemoryContext cxt;
 	MemoryContext oldcxt;
 	int			rv;
@@ -177,7 +170,6 @@ PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
 
 	oldcxt = MemoryContextSwitchTo(cxt);
 
-	proc = palloc0_object(PLyProcedure);
 	proc->mcxt = cxt;
 
 	PG_TRY();
@@ -191,8 +183,6 @@ PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
 		proc->proname = pstrdup(NameStr(procStruct->proname));
 		MemoryContextSetIdentifier(cxt, proc->proname);
 		proc->pyname = pstrdup(procName);
-		proc->fn_xmin = HeapTupleHeaderGetRawXmin(procTup->t_data);
-		proc->fn_tid = procTup->t_self;
 		proc->fn_readonly = (procStruct->provolatile != PROVOLATILE_VOLATILE);
 		proc->is_setof = procStruct->proretset;
 		proc->is_procedure = (procStruct->prokind == PROKIND_PROCEDURE);
@@ -355,7 +345,6 @@ PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
 	PG_END_TRY();
 
 	MemoryContextSwitchTo(oldcxt);
-	return proc;
 }
 
 /*
@@ -424,23 +413,6 @@ PLy_procedure_delete(PLyProcedure *proc)
 	MemoryContextDelete(proc->mcxt);
 }
 
-/*
- * Decide whether a cached PLyProcedure struct is still valid
- */
-static bool
-PLy_procedure_valid(PLyProcedure *proc, HeapTuple procTup)
-{
-	if (proc == NULL)
-		return false;
-
-	/* If the pg_proc tuple has changed, it's not valid */
-	if (!(proc->fn_xmin == HeapTupleHeaderGetRawXmin(procTup->t_data) &&
-		  ItemPointerEquals(&proc->fn_tid, &procTup->t_self)))
-		return false;
-
-	return true;
-}
-
 static char *
 PLy_procedure_munge_source(const char *name, const char *src)
 {
@@ -485,3 +457,57 @@ PLy_procedure_munge_source(const char *name, const char *src)
 
 	return mrc;
 }
+
+static void
+PLy_compile_callback(FunctionCallInfo fcinfo,
+					 HeapTuple procTup,
+					 const CachedFunctionHashKey *hashkey,
+					 CachedFunction *cfunc,
+					 bool forValidator)
+{
+	PLyProcedure *proc = (PLyProcedure *) cfunc;
+	PLyTrigType is_trigger;
+	Oid			fn_oid = fcinfo->flinfo->fn_oid;
+
+	if (CALLED_AS_TRIGGER(fcinfo))
+		is_trigger = PLPY_TRIGGER;
+	else if (CALLED_AS_EVENT_TRIGGER(fcinfo))
+		is_trigger = PLPY_EVENT_TRIGGER;
+	else
+		is_trigger = PLPY_NOT_TRIGGER;
+
+	PLy_procedure_create(proc, procTup, fn_oid, is_trigger);
+}
+
+static void
+PLy_delete_callback(CachedFunction *cfunc)
+{
+	PLyProcedure *proc = (PLyProcedure *) cfunc;
+
+	Assert(proc->cfunc.use_count == 0);
+	Assert(proc->calldepth == 0);
+
+	PLy_procedure_delete(proc);
+}
+
+/*
+ * MemoryContext callback function
+ *
+ * We register this in the memory context that contains a PLyProcedureCache
+ * struct.  When the memory context is reset or deleted, we release the
+ * reference count (if any) that the cache holds on the long-lived hash entry.
+ * Note that this will happen even during error aborts.
+ */
+static void
+RemovePLyProcedureCache(void *arg)
+{
+	PLyProcedureCache *pcache = (PLyProcedureCache *) arg;
+
+	/* Release reference count on PLyProcedure */
+	if (pcache->proc != NULL)
+	{
+		Assert(pcache->proc->cfunc.use_count > 0);
+		pcache->proc->cfunc.use_count--;
+		pcache->proc = NULL;
+	}
+}
diff --git a/src/pl/plpython/plpy_procedure.h b/src/pl/plpython/plpy_procedure.h
index 3ef22844a9b..4527b783897 100644
--- a/src/pl/plpython/plpy_procedure.h
+++ b/src/pl/plpython/plpy_procedure.h
@@ -6,9 +6,7 @@
 #define PLPY_PROCEDURE_H
 
 #include "plpy_typeio.h"
-
-
-extern void init_procedure_caches(void);
+#include "utils/funccache.h"
 
 
 /*
@@ -31,15 +29,28 @@ typedef struct PLySavedArgs
 	PyObject   *namedargs[FLEXIBLE_ARRAY_MEMBER];	/* named args */
 } PLySavedArgs;
 
-/* cached procedure data */
+/* saved state for a set-returning function */
+typedef struct PLySRFState
+{
+	PyObject   *iter;			/* Python iterator producing results */
+	PLySavedArgs *savedargs;	/* function argument values */
+} PLySRFState;
+
+/*
+ * Long-lived data for a PL/Python function.
+ *
+ * This struct is managed by funccache.c and can be shared across multiple
+ * executions of the same function.  It must contain no execution-specific
+ * state.  The CachedFunction struct must be first so we can cast between them.
+ */
 typedef struct PLyProcedure
 {
+	CachedFunction cfunc;		/* fields managed by funccache.c */
+
 	MemoryContext mcxt;			/* context holding this PLyProcedure and its
 								 * subsidiary data */
 	char	   *proname;		/* SQL name of procedure */
 	char	   *pyname;			/* Python name of procedure */
-	TransactionId fn_xmin;
-	ItemPointerData fn_tid;
 	bool		fn_readonly;
 	bool		is_setof;		/* true, if function returns result set */
 	bool		is_procedure;
@@ -59,23 +70,27 @@ typedef struct PLyProcedure
 	PLySavedArgs *argstack;		/* stack of outer-level call arguments */
 } PLyProcedure;
 
-/* the procedure cache key */
-typedef struct PLyProcedureKey
+/*
+ * Per-call-site cache for a PL/Python function.
+ *
+ * This struct is stored in fn_extra and holds execution-specific state,
+ * including a pointer to the long-lived PLyProcedure.  The use_count in
+ * the PLyProcedure is incremented while we hold a reference.
+ */
+typedef struct PLyProcedureCache
 {
-	Oid			fn_oid;			/* function OID */
-	Oid			fn_rel;			/* triggered-on relation or InvalidOid */
-} PLyProcedureKey;
+	PLyProcedure *proc;			/* long-lived hash entry */
+	MemoryContext fcontext;		/* fn_mcxt - context holding this struct */
+	PLySRFState *srfstate;		/* SRF execution state, NULL if not in SRF */
+	bool		shutdown_reg;	/* true if registered shutdown callback */
 
-/* the procedure cache entry */
-typedef struct PLyProcedureEntry
-{
-	PLyProcedureKey key;		/* hash key */
-	PLyProcedure *proc;
-} PLyProcedureEntry;
+	/* Callback to release use-count when fcontext is deleted */
+	MemoryContextCallback mcb;
+} PLyProcedureCache;
 
 /* PLyProcedure manipulation */
 extern char *PLy_procedure_name(PLyProcedure *proc);
-extern PLyProcedure *PLy_procedure_get(Oid fn_oid, Oid fn_rel, PLyTrigType is_trigger);
+extern PLyProcedureCache *PLy_procedure_get(FunctionCallInfo fcinfo, bool forValidator);
 extern void PLy_procedure_compile(PLyProcedure *proc, const char *src);
 extern void PLy_procedure_delete(PLyProcedure *proc);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cf40c87043..636c8b27fe7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2074,6 +2074,7 @@ PLyObToTuple
 PLyObject_AsString_t
 PLyPlanObject
 PLyProcedure
+PLyProcedureCache
 PLyProcedureEntry
 PLyProcedureKey
 PLyResultObject
-- 
2.50.1 (Apple Git-155)



Attachments:

  [text/plain] v1-0001-plpython-Use-funccache.c-infrastructure-for-proce.patch (30.1K, 2-v1-0001-plpython-Use-funccache.c-infrastructure-for-proce.patch)
  download | inline diff:
From 622df933f9badc68c39f7b88376427fbbbd2b099 Mon Sep 17 00:00:00 2001
From: Matheus Alcantara <[email protected]>
Date: Fri, 5 Jun 2026 10:51:53 -0300
Subject: [PATCH v1] plpython: Use funccache.c infrastructure for procedure
 caching

PL/Python set-returning functions can crash with a use-after-free when
CREATE OR REPLACE FUNCTION is executed while the SRF is mid-iteration.
The crash occurs because srfstate->savedargs is allocated in proc->mcxt,
which gets deleted when the procedure is invalidated, leaving a dangling
pointer that PLy_function_restore_args() then dereferences.

The fix is to use reference counting to prevent destroying the function
state while it's still in use, similar to what PL/pgSQL has done. This
commit converts PL/Python to use the funccache.c infrastructure
introduced in v18.

The main challenge is that PL/Python uses SFRM_ValuePerCall for SRFs,
where the handler is called multiple times with use_count potentially
returning to zero between calls. SQL functions face the same challenge,
so this commit follows the same approach used in functions.c: maintain
a per-call-site cache struct (PLyProcedureCache) in fn_extra that holds
both the pointer to the long-lived PLyProcedure and the SRF execution
state. The use_count is incremented when we first obtain the procedure
and decremented via a MemoryContextCallback when fn_mcxt is deleted.
For SRFs, we register an ExprContextCallback to clean up iterator state
when the expression context is shut down.

Since fn_extra is now used for PLyProcedureCache, this commit removes
the SRF macros (SRF_IS_FIRSTCALL, SRF_RETURN_NEXT, etc.) and switches to
direct isDone signaling via ReturnSetInfo, matching how SQL functions
handle ValuePerCall mode.

Author: Matheus Alcantara <[email protected]>
Reported-by: Andrzej Doros <[email protected]>
Suggested-by: Tom Lane <[email protected]>
Discussion: https://www.postgresql.org/message-id/19480-f1f9fdce30462fc4%40postgresql.org
---
 src/pl/plpython/plpy_exec.c      | 160 +++++++++++---------
 src/pl/plpython/plpy_exec.h      |   2 +-
 src/pl/plpython/plpy_main.c      |  88 ++++++-----
 src/pl/plpython/plpy_procedure.c | 248 +++++++++++++++++--------------
 src/pl/plpython/plpy_procedure.h |  51 ++++---
 src/tools/pgindent/typedefs.list |   1 +
 6 files changed, 305 insertions(+), 245 deletions(-)

diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index de0dad1f533..5cbcb031fb3 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -22,22 +22,14 @@
 #include "utils/fmgrprotos.h"
 #include "utils/rel.h"
 
-/* saved state for a set-returning function */
-typedef struct PLySRFState
-{
-	PyObject   *iter;			/* Python iterator producing results */
-	PLySavedArgs *savedargs;	/* function argument values */
-	MemoryContextCallback callback; /* for releasing refcounts when done */
-} PLySRFState;
-
 static PyObject *PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc);
-static PLySavedArgs *PLy_function_save_args(PLyProcedure *proc);
+static PLySavedArgs *PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc);
 static void PLy_function_restore_args(PLyProcedure *proc, PLySavedArgs *savedargs);
 static void PLy_function_drop_args(PLySavedArgs *savedargs);
 static void PLy_global_args_push(PLyProcedure *proc);
 static void PLy_global_args_pop(PLyProcedure *proc);
-static void plpython_srf_cleanup_callback(void *arg);
 static void plpython_return_error_callback(void *arg);
+static void ShutdownPLyFunction(Datum arg);
 
 static PyObject *PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc,
 										HeapTuple *rv);
@@ -51,14 +43,15 @@ static void PLy_abort_open_subtransactions(int save_subxact_level);
 
 /* function subhandler */
 Datum
-PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
+PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedureCache *pcache)
 {
+	PLyProcedure *proc = pcache->proc;
 	bool		is_setof = proc->is_setof;
 	Datum		rv;
 	PyObject   *volatile plargs = NULL;
 	PyObject   *volatile plrv = NULL;
-	FuncCallContext *volatile funcctx = NULL;
 	PLySRFState *volatile srfstate = NULL;
+	ReturnSetInfo *rsi = NULL;
 	ErrorContextCallback plerrcontext;
 
 	/*
@@ -72,25 +65,32 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 	{
 		if (is_setof)
 		{
-			/* First Call setup */
-			if (SRF_IS_FIRSTCALL())
+			rsi = (ReturnSetInfo *) fcinfo->resultinfo;
+
+			/* First call setup */
+			if (pcache->srfstate == NULL)
 			{
-				funcctx = SRF_FIRSTCALL_INIT();
-				srfstate = (PLySRFState *)
-					MemoryContextAllocZero(funcctx->multi_call_memory_ctx,
-										   sizeof(PLySRFState));
-				/* Immediately register cleanup callback */
-				srfstate->callback.func = plpython_srf_cleanup_callback;
-				srfstate->callback.arg = srfstate;
-				MemoryContextRegisterResetCallback(funcctx->multi_call_memory_ctx,
-												   &srfstate->callback);
-				funcctx->user_fctx = srfstate;
+				if (!rsi || !IsA(rsi, ReturnSetInfo) ||
+					(rsi->allowedModes & SFRM_ValuePerCall) == 0)
+				{
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("unsupported set function return mode"),
+							 errdetail("PL/Python set-returning functions only support returning one value per call.")));
+				}
+				rsi->returnMode = SFRM_ValuePerCall;
+
+				pcache->srfstate = (PLySRFState *)
+					MemoryContextAllocZero(pcache->fcontext, sizeof(PLySRFState));
+
+				/* Register shutdown callback to clean up at end of expression */
+				RegisterExprContextCallback(rsi->econtext,
+											ShutdownPLyFunction,
+											PointerGetDatum(pcache));
+				pcache->shutdown_reg = true;
 			}
-			/* Every call setup */
-			funcctx = SRF_PERCALL_SETUP();
-			Assert(funcctx != NULL);
-			srfstate = (PLySRFState *) funcctx->user_fctx;
-			Assert(srfstate != NULL);
+
+			srfstate = pcache->srfstate;
 		}
 
 		if (srfstate == NULL || srfstate->iter == NULL)
@@ -127,20 +127,7 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 		{
 			if (srfstate->iter == NULL)
 			{
-				/* first time -- do checks and setup */
-				ReturnSetInfo *rsi = (ReturnSetInfo *) fcinfo->resultinfo;
-
-				if (!rsi || !IsA(rsi, ReturnSetInfo) ||
-					(rsi->allowedModes & SFRM_ValuePerCall) == 0)
-				{
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("unsupported set function return mode"),
-							 errdetail("PL/Python set-returning functions only support returning one value per call.")));
-				}
-				rsi->returnMode = SFRM_ValuePerCall;
-
-				/* Make iterator out of returned object */
+				/* first time -- make iterator out of returned object */
 				srfstate->iter = PyObject_GetIter(plrv);
 
 				Py_DECREF(plrv);
@@ -177,7 +164,7 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 				 * this again each time in case the iterator is changing those
 				 * values.
 				 */
-				srfstate->savedargs = PLy_function_save_args(proc);
+				srfstate->savedargs = PLy_function_save_args(pcache->fcontext, proc);
 			}
 		}
 
@@ -263,8 +250,8 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 		 * If there was an error within a SRF, the iterator might not have
 		 * been exhausted yet.  Clear it so the next invocation of the
 		 * function will start the iteration again.  (This code is probably
-		 * unnecessary now; plpython_srf_cleanup_callback should take care of
-		 * cleanup.  But it doesn't hurt anything to do it here.)
+		 * unnecessary now; ShutdownPLyFunction should take care of cleanup.
+		 * But it doesn't hurt anything to do it here.)
 		 */
 		if (srfstate)
 		{
@@ -290,22 +277,66 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 
 	if (srfstate)
 	{
-		/* We're in a SRF, exit appropriately */
+		/* We're in a SRF, signal via rsi->isDone */
 		if (srfstate->iter == NULL)
 		{
-			/* Iterator exhausted, so we're done */
-			SRF_RETURN_DONE(funcctx);
+			/*
+			 * Iterator exhausted.  Unregister the shutdown callback since
+			 * we're done normally, then clean up srfstate.
+			 */
+			if (pcache->shutdown_reg)
+			{
+				UnregisterExprContextCallback(rsi->econtext,
+											  ShutdownPLyFunction,
+											  PointerGetDatum(pcache));
+				pcache->shutdown_reg = false;
+			}
+			pfree(pcache->srfstate);
+			pcache->srfstate = NULL;
+
+			rsi->isDone = ExprEndResult;
+			fcinfo->isnull = true;
+			return (Datum) 0;
 		}
-		else if (fcinfo->isnull)
-			SRF_RETURN_NEXT_NULL(funcctx);
 		else
-			SRF_RETURN_NEXT(funcctx, rv);
+		{
+			rsi->isDone = ExprMultipleResult;
+			return rv;
+		}
 	}
 
 	/* Plain function, just return the Datum value (possibly null) */
 	return rv;
 }
 
+/*
+ * Callback function invoked when an expression context holding a SRF
+ * is shut down.  This cleans up any Python iterator state.
+ */
+static void
+ShutdownPLyFunction(Datum arg)
+{
+	PLyProcedureCache *pcache = (PLyProcedureCache *) DatumGetPointer(arg);
+	PLySRFState *srfstate = pcache->srfstate;
+
+	pcache->shutdown_reg = false;
+
+	if (srfstate != NULL)
+	{
+		/* Release the Python iterator if still active */
+		Py_XDECREF(srfstate->iter);
+		srfstate->iter = NULL;
+
+		/* Drop any saved args */
+		if (srfstate->savedargs)
+			PLy_function_drop_args(srfstate->savedargs);
+		srfstate->savedargs = NULL;
+
+		pfree(srfstate);
+		pcache->srfstate = NULL;
+	}
+}
+
 /*
  * trigger subhandler
  *
@@ -536,13 +567,13 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
  * available via the proc's globals :-( ... but we're stuck with that now.
  */
 static PLySavedArgs *
-PLy_function_save_args(PLyProcedure *proc)
+PLy_function_save_args(MemoryContext mctx, PLyProcedure *proc)
 {
 	PLySavedArgs *result;
 
 	/* saved args are always allocated in procedure's context */
 	result = (PLySavedArgs *)
-		MemoryContextAllocZero(proc->mcxt,
+		MemoryContextAllocZero(mctx,
 							   offsetof(PLySavedArgs, namedargs) +
 							   proc->nargs * sizeof(PyObject *));
 	result->nargs = proc->nargs;
@@ -659,7 +690,7 @@ PLy_global_args_push(PLyProcedure *proc)
 		PLySavedArgs *node;
 
 		/* Build a struct containing current argument values */
-		node = PLy_function_save_args(proc);
+		node = PLy_function_save_args(proc->mcxt, proc);
 
 		/*
 		 * Push the saved argument values into the procedure's stack.  Once we
@@ -713,25 +744,6 @@ PLy_global_args_pop(PLyProcedure *proc)
 	}
 }
 
-/*
- * Memory context deletion callback for cleaning up a PLySRFState.
- * We need this in case execution of the SRF is terminated early,
- * due to error or the caller simply not running it to completion.
- */
-static void
-plpython_srf_cleanup_callback(void *arg)
-{
-	PLySRFState *srfstate = (PLySRFState *) arg;
-
-	/* Release refcount on the iter, if we still have one */
-	Py_XDECREF(srfstate->iter);
-	srfstate->iter = NULL;
-	/* And drop any saved args; we won't need them */
-	if (srfstate->savedargs)
-		PLy_function_drop_args(srfstate->savedargs);
-	srfstate->savedargs = NULL;
-}
-
 static void
 plpython_return_error_callback(void *arg)
 {
diff --git a/src/pl/plpython/plpy_exec.h b/src/pl/plpython/plpy_exec.h
index f35eabbd8ee..1ade1bae151 100644
--- a/src/pl/plpython/plpy_exec.h
+++ b/src/pl/plpython/plpy_exec.h
@@ -7,7 +7,7 @@
 
 #include "plpy_procedure.h"
 
-extern Datum PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc);
+extern Datum PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedureCache *pcache);
 extern HeapTuple PLy_exec_trigger(FunctionCallInfo fcinfo, PLyProcedure *proc);
 extern void PLy_exec_event_trigger(FunctionCallInfo fcinfo, PLyProcedure *proc);
 
diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c
index 9f07c115f80..2ed9abab15b 100644
--- a/src/pl/plpython/plpy_main.c
+++ b/src/pl/plpython/plpy_main.c
@@ -39,7 +39,6 @@ PG_FUNCTION_INFO_V1(plpython3_call_handler);
 PG_FUNCTION_INFO_V1(plpython3_inline_handler);
 
 
-static PLyTrigType PLy_procedure_is_trigger(Form_pg_proc procStruct);
 static void plpython_error_callback(void *arg);
 static void plpython_inline_error_callback(void *arg);
 
@@ -103,8 +102,6 @@ _PG_init(void)
 
 	Py_DECREF(main_mod);
 
-	init_procedure_caches();
-
 	explicit_subtransactions = NIL;
 
 	PLy_execution_contexts = NULL;
@@ -113,10 +110,15 @@ _PG_init(void)
 Datum
 plpython3_validator(PG_FUNCTION_ARGS)
 {
+	LOCAL_FCINFO(fake_fcinfo, 0);
 	Oid			funcoid = PG_GETARG_OID(0);
 	HeapTuple	tuple;
 	Form_pg_proc procStruct;
-	PLyTrigType is_trigger;
+	FmgrInfo	flinfo;
+	TriggerData trigdata;
+	EventTriggerData etrigdata;
+	bool		is_trigger = false;
+	bool		is_event_trigger = false;
 
 	if (!CheckFunctionValidatorAccess(fcinfo->flinfo->fn_oid, funcoid))
 		PG_RETURN_VOID();
@@ -130,12 +132,33 @@ plpython3_validator(PG_FUNCTION_ARGS)
 		elog(ERROR, "cache lookup failed for function %u", funcoid);
 	procStruct = (Form_pg_proc) GETSTRUCT(tuple);
 
-	is_trigger = PLy_procedure_is_trigger(procStruct);
+	if (procStruct->prorettype == TRIGGEROID)
+		is_trigger = true;
+	else if (procStruct->prorettype == EVENT_TRIGGEROID)
+		is_event_trigger = true;
 
 	ReleaseSysCache(tuple);
 
-	/* We can't validate triggers against any particular table ... */
-	(void) PLy_procedure_get(funcoid, InvalidOid, is_trigger);
+	MemSet(fake_fcinfo, 0, SizeForFunctionCallInfo(0));
+	MemSet(&flinfo, 0, sizeof(flinfo));
+	fake_fcinfo->flinfo = &flinfo;
+	flinfo.fn_oid = funcoid;
+	flinfo.fn_mcxt = CurrentMemoryContext;
+
+	if (is_trigger)
+	{
+		MemSet(&trigdata, 0, sizeof(trigdata));
+		trigdata.type = T_TriggerData;
+		fake_fcinfo->context = (Node *) &trigdata;
+	}
+	else if (is_event_trigger)
+	{
+		MemSet(&etrigdata, 0, sizeof(etrigdata));
+		etrigdata.type = T_EventTriggerData;
+		fake_fcinfo->context = (Node *) &etrigdata;
+	}
+
+	(void) PLy_procedure_get(fake_fcinfo, true);
 
 	PG_RETURN_VOID();
 }
@@ -143,6 +166,7 @@ plpython3_validator(PG_FUNCTION_ARGS)
 Datum
 plpython3_call_handler(PG_FUNCTION_ARGS)
 {
+	PLyProcedureCache *proc;
 	bool		nonatomic;
 	Datum		retval;
 	PLyExecutionContext *exec_ctx;
@@ -162,11 +186,10 @@ plpython3_call_handler(PG_FUNCTION_ARGS)
 	 */
 	exec_ctx = PLy_push_execution_context(!nonatomic);
 
+	proc = PLy_procedure_get(fcinfo, false);
+
 	PG_TRY();
 	{
-		Oid			funcoid = fcinfo->flinfo->fn_oid;
-		PLyProcedure *proc;
-
 		/*
 		 * Setup error traceback support for ereport().  Note that the PG_TRY
 		 * structure pops this for us again at exit, so we needn't do that
@@ -180,32 +203,30 @@ plpython3_call_handler(PG_FUNCTION_ARGS)
 
 		if (CALLED_AS_TRIGGER(fcinfo))
 		{
-			Relation	tgrel = ((TriggerData *) fcinfo->context)->tg_relation;
 			HeapTuple	trv;
 
-			proc = PLy_procedure_get(funcoid, RelationGetRelid(tgrel), PLPY_TRIGGER);
-			exec_ctx->curr_proc = proc;
-			trv = PLy_exec_trigger(fcinfo, proc);
+			exec_ctx->curr_proc = proc->proc;
+			trv = PLy_exec_trigger(fcinfo, proc->proc);
 			retval = PointerGetDatum(trv);
 		}
 		else if (CALLED_AS_EVENT_TRIGGER(fcinfo))
 		{
-			proc = PLy_procedure_get(funcoid, InvalidOid, PLPY_EVENT_TRIGGER);
-			exec_ctx->curr_proc = proc;
-			PLy_exec_event_trigger(fcinfo, proc);
+			exec_ctx->curr_proc = proc->proc;
+			PLy_exec_event_trigger(fcinfo, proc->proc);
 			retval = (Datum) 0;
 		}
 		else
 		{
-			proc = PLy_procedure_get(funcoid, InvalidOid, PLPY_NOT_TRIGGER);
-			exec_ctx->curr_proc = proc;
+			exec_ctx->curr_proc = proc->proc;
 			retval = PLy_exec_function(fcinfo, proc);
 		}
 	}
 	PG_CATCH();
 	{
+		/* Destroy the execution context */
 		PLy_pop_execution_context();
 		PyErr_Clear();
+
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
@@ -223,6 +244,7 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 	InlineCodeBlock *codeblock = (InlineCodeBlock *) DatumGetPointer(PG_GETARG_DATUM(0));
 	FmgrInfo	flinfo;
 	PLyProcedure proc;
+	PLyProcedureCache pcache;
 	PLyExecutionContext *exec_ctx;
 	ErrorContextCallback plerrcontext;
 
@@ -248,6 +270,11 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 	 */
 	proc.result.typoid = VOIDOID;
 
+	/* Set up a minimal PLyProcedureCache for the inline block */
+	MemSet(&pcache, 0, sizeof(PLyProcedureCache));
+	pcache.proc = &proc;
+	pcache.fcontext = CurrentMemoryContext;
+
 	/*
 	 * Push execution context onto stack.  It is important that this get
 	 * popped again, so avoid putting anything that could throw error between
@@ -269,7 +296,7 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 
 		PLy_procedure_compile(&proc, codeblock->source_text);
 		exec_ctx->curr_proc = &proc;
-		PLy_exec_function(fake_fcinfo, &proc);
+		PLy_exec_function(fake_fcinfo, &pcache);
 	}
 	PG_CATCH();
 	{
@@ -289,27 +316,6 @@ plpython3_inline_handler(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
-static PLyTrigType
-PLy_procedure_is_trigger(Form_pg_proc procStruct)
-{
-	PLyTrigType ret;
-
-	switch (procStruct->prorettype)
-	{
-		case TRIGGEROID:
-			ret = PLPY_TRIGGER;
-			break;
-		case EVENT_TRIGGEROID:
-			ret = PLPY_EVENT_TRIGGER;
-			break;
-		default:
-			ret = PLPY_NOT_TRIGGER;
-			break;
-	}
-
-	return ret;
-}
-
 static void
 plpython_error_callback(void *arg)
 {
diff --git a/src/pl/plpython/plpy_procedure.c b/src/pl/plpython/plpy_procedure.c
index 750ba586e0c..02a23e170b3 100644
--- a/src/pl/plpython/plpy_procedure.c
+++ b/src/pl/plpython/plpy_procedure.c
@@ -9,33 +9,31 @@
 #include "access/htup_details.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/event_trigger.h"
+#include "commands/trigger.h"
 #include "funcapi.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
 #include "plpy_procedure.h"
 #include "plpy_util.h"
 #include "utils/builtins.h"
-#include "utils/hsearch.h"
+#include "utils/funccache.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/syscache.h"
 
-static HTAB *PLy_procedure_cache = NULL;
-
-static PLyProcedure *PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger);
-static bool PLy_procedure_valid(PLyProcedure *proc, HeapTuple procTup);
+static void PLy_procedure_create(PLyProcedure *proc,
+								 HeapTuple procTup,
+								 Oid fn_oid,
+								 PLyTrigType is_trigger);
 static char *PLy_procedure_munge_source(const char *name, const char *src);
-
-
-void
-init_procedure_caches(void)
-{
-	HASHCTL		hash_ctl;
-
-	hash_ctl.keysize = sizeof(PLyProcedureKey);
-	hash_ctl.entrysize = sizeof(PLyProcedureEntry);
-	PLy_procedure_cache = hash_create("PL/Python procedures", 32, &hash_ctl,
-									  HASH_ELEM | HASH_BLOBS);
-}
+static void PLy_compile_callback(FunctionCallInfo fcinfo,
+								 HeapTuple procTup,
+								 const CachedFunctionHashKey *hashkey,
+								 CachedFunction *cfunc,
+								 bool forValidator);
+static void PLy_delete_callback(CachedFunction *cfunc);
+static void RemovePLyProcedureCache(void *arg);
 
 /*
  * PLy_procedure_name: get the name of the specified procedure.
@@ -51,103 +49,98 @@ PLy_procedure_name(PLyProcedure *proc)
 }
 
 /*
- * PLy_procedure_get: returns a cached PLyProcedure, or creates, stores and
- * returns a new PLyProcedure.
+ * PLy_procedure_get: returns a cached PLyProcedureCache for the function.
  *
- * fn_oid is the OID of the function requested
- * fn_rel is InvalidOid or the relation this function triggers on
- * is_trigger denotes whether the function is a trigger function
+ * The PLyProcedureCache contains a pointer to the long-lived PLyProcedure
+ * (managed by funccache.c) and execution-specific state like SRF state.
  *
- * The reason that both fn_rel and is_trigger need to be passed is that when
- * trigger functions get validated we don't know which relation(s) they'll
- * be used with, so no sensible fn_rel can be passed.  Also, in that case
- * we can't make a cache entry because we can't construct the right cache key.
- * To forestall leakage of the PLyProcedure in such cases, delete it after
- * construction and return NULL.  That's okay because the only caller that
- * would pass that set of values is plpython3_validator, which ignores our
- * result anyway.
+ * For SRFs, if we are resuming execution (srfstate->iter != NULL), we skip
+ * revalidation and continue using the same PLyProcedure to ensure consistent
+ * behavior throughout the SRF execution.
  */
-PLyProcedure *
-PLy_procedure_get(Oid fn_oid, Oid fn_rel, PLyTrigType is_trigger)
+PLyProcedureCache *
+PLy_procedure_get(FunctionCallInfo fcinfo, bool forValidator)
 {
-	bool		use_cache;
-	HeapTuple	procTup;
-	PLyProcedureKey key;
-	PLyProcedureEntry *volatile entry = NULL;
-	PLyProcedure *volatile proc = NULL;
-	bool		found = false;
-
-	if (is_trigger == PLPY_TRIGGER && fn_rel == InvalidOid)
-		use_cache = false;
-	else
-		use_cache = true;
+	PLyProcedure *proc;
+	PLyProcedureCache *pcache;
+	FmgrInfo   *finfo = fcinfo->flinfo;
 
-	procTup = SearchSysCache1(PROCOID, ObjectIdGetDatum(fn_oid));
-	if (!HeapTupleIsValid(procTup))
-		elog(ERROR, "cache lookup failed for function %u", fn_oid);
+	/*
+	 * If this is the first execution for this FmgrInfo, set up a cache struct
+	 * (initially containing null pointers).  The cache must live as long as
+	 * the FmgrInfo, so it goes in fn_mcxt.  Also set up a memory context
+	 * callback that will be invoked when fn_mcxt is deleted.
+	 */
+	pcache = finfo->fn_extra;
+	if (pcache == NULL)
+	{
+		pcache = (PLyProcedureCache *)
+			MemoryContextAllocZero(finfo->fn_mcxt, sizeof(PLyProcedureCache));
+
+		pcache->fcontext = finfo->fn_mcxt;
+		pcache->mcb.func = RemovePLyProcedureCache;
+		pcache->mcb.arg = pcache;
+
+		MemoryContextRegisterResetCallback(finfo->fn_mcxt, &pcache->mcb);
+
+		finfo->fn_extra = pcache;
+	}
 
 	/*
-	 * Look for the function in the cache, unless we don't have the necessary
-	 * information (e.g. during validation). In that case we just don't cache
-	 * anything.
+	 * If we are resuming execution of a set-returning function, just keep
+	 * using the same cache.  We do not ask funccache.c to re-validate the
+	 * PLyProcedure: we want to run to completion using the function's initial
+	 * definition.
 	 */
-	if (use_cache)
+	if (pcache->srfstate != NULL && pcache->srfstate->iter != NULL)
 	{
-		key.fn_oid = fn_oid;
-		key.fn_rel = fn_rel;
-		entry = hash_search(PLy_procedure_cache, &key, HASH_ENTER, &found);
-		proc = entry->proc;
+		Assert(pcache->proc != NULL);
+		return pcache;
 	}
 
-	PG_TRY();
+	/*
+	 * Look up, or re-validate, the long-lived hash entry.
+	 */
+	proc = (PLyProcedure *)
+		cached_function_compile(fcinfo,
+								(CachedFunction *) pcache->proc,
+								PLy_compile_callback,
+								PLy_delete_callback,
+								sizeof(PLyProcedure),
+								true,
+								forValidator);
+
+	/*
+	 * Install the hash pointer in the PLyProcedureCache, and increment its
+	 * use count to reflect that.  If cached_function_compile gave us back a
+	 * different hash entry than we were using before, we must decrement that
+	 * one's use count.
+	 */
+	if (proc != pcache->proc)
 	{
-		if (!found)
+		if (pcache->proc != NULL)
 		{
-			/* Haven't found it, create a new procedure */
-			proc = PLy_procedure_create(procTup, fn_oid, is_trigger);
-			if (use_cache)
-				entry->proc = proc;
-			else
-			{
-				/* Delete the proc, otherwise it's a memory leak */
-				PLy_procedure_delete(proc);
-				proc = NULL;
-			}
-		}
-		else if (!PLy_procedure_valid(proc, procTup))
-		{
-			/* Found it, but it's invalid, free and reuse the cache entry */
-			entry->proc = NULL;
-			if (proc)
-				PLy_procedure_delete(proc);
-			proc = PLy_procedure_create(procTup, fn_oid, is_trigger);
-			entry->proc = proc;
+			Assert(pcache->proc->cfunc.use_count > 0);
+			pcache->proc->cfunc.use_count--;
 		}
-		/* Found it and it's valid, it's fine to use it */
-	}
-	PG_CATCH();
-	{
-		/* Do not leave an uninitialized entry in the cache */
-		if (use_cache)
-			hash_search(PLy_procedure_cache, &key, HASH_REMOVE, NULL);
-		PG_RE_THROW();
+		pcache->proc = proc;
+		proc->cfunc.use_count++;
 	}
-	PG_END_TRY();
-
-	ReleaseSysCache(procTup);
 
-	return proc;
+	return pcache;
 }
 
 /*
  * Create a new PLyProcedure structure
  */
-static PLyProcedure *
-PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
+static void
+PLy_procedure_create(PLyProcedure *proc,
+					 HeapTuple procTup,
+					 Oid fn_oid,
+					 PLyTrigType is_trigger)
 {
 	char		procName[NAMEDATALEN + 256];
 	Form_pg_proc procStruct;
-	PLyProcedure *volatile proc;
 	MemoryContext cxt;
 	MemoryContext oldcxt;
 	int			rv;
@@ -177,7 +170,6 @@ PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
 
 	oldcxt = MemoryContextSwitchTo(cxt);
 
-	proc = palloc0_object(PLyProcedure);
 	proc->mcxt = cxt;
 
 	PG_TRY();
@@ -191,8 +183,6 @@ PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
 		proc->proname = pstrdup(NameStr(procStruct->proname));
 		MemoryContextSetIdentifier(cxt, proc->proname);
 		proc->pyname = pstrdup(procName);
-		proc->fn_xmin = HeapTupleHeaderGetRawXmin(procTup->t_data);
-		proc->fn_tid = procTup->t_self;
 		proc->fn_readonly = (procStruct->provolatile != PROVOLATILE_VOLATILE);
 		proc->is_setof = procStruct->proretset;
 		proc->is_procedure = (procStruct->prokind == PROKIND_PROCEDURE);
@@ -355,7 +345,6 @@ PLy_procedure_create(HeapTuple procTup, Oid fn_oid, PLyTrigType is_trigger)
 	PG_END_TRY();
 
 	MemoryContextSwitchTo(oldcxt);
-	return proc;
 }
 
 /*
@@ -424,23 +413,6 @@ PLy_procedure_delete(PLyProcedure *proc)
 	MemoryContextDelete(proc->mcxt);
 }
 
-/*
- * Decide whether a cached PLyProcedure struct is still valid
- */
-static bool
-PLy_procedure_valid(PLyProcedure *proc, HeapTuple procTup)
-{
-	if (proc == NULL)
-		return false;
-
-	/* If the pg_proc tuple has changed, it's not valid */
-	if (!(proc->fn_xmin == HeapTupleHeaderGetRawXmin(procTup->t_data) &&
-		  ItemPointerEquals(&proc->fn_tid, &procTup->t_self)))
-		return false;
-
-	return true;
-}
-
 static char *
 PLy_procedure_munge_source(const char *name, const char *src)
 {
@@ -485,3 +457,57 @@ PLy_procedure_munge_source(const char *name, const char *src)
 
 	return mrc;
 }
+
+static void
+PLy_compile_callback(FunctionCallInfo fcinfo,
+					 HeapTuple procTup,
+					 const CachedFunctionHashKey *hashkey,
+					 CachedFunction *cfunc,
+					 bool forValidator)
+{
+	PLyProcedure *proc = (PLyProcedure *) cfunc;
+	PLyTrigType is_trigger;
+	Oid			fn_oid = fcinfo->flinfo->fn_oid;
+
+	if (CALLED_AS_TRIGGER(fcinfo))
+		is_trigger = PLPY_TRIGGER;
+	else if (CALLED_AS_EVENT_TRIGGER(fcinfo))
+		is_trigger = PLPY_EVENT_TRIGGER;
+	else
+		is_trigger = PLPY_NOT_TRIGGER;
+
+	PLy_procedure_create(proc, procTup, fn_oid, is_trigger);
+}
+
+static void
+PLy_delete_callback(CachedFunction *cfunc)
+{
+	PLyProcedure *proc = (PLyProcedure *) cfunc;
+
+	Assert(proc->cfunc.use_count == 0);
+	Assert(proc->calldepth == 0);
+
+	PLy_procedure_delete(proc);
+}
+
+/*
+ * MemoryContext callback function
+ *
+ * We register this in the memory context that contains a PLyProcedureCache
+ * struct.  When the memory context is reset or deleted, we release the
+ * reference count (if any) that the cache holds on the long-lived hash entry.
+ * Note that this will happen even during error aborts.
+ */
+static void
+RemovePLyProcedureCache(void *arg)
+{
+	PLyProcedureCache *pcache = (PLyProcedureCache *) arg;
+
+	/* Release reference count on PLyProcedure */
+	if (pcache->proc != NULL)
+	{
+		Assert(pcache->proc->cfunc.use_count > 0);
+		pcache->proc->cfunc.use_count--;
+		pcache->proc = NULL;
+	}
+}
diff --git a/src/pl/plpython/plpy_procedure.h b/src/pl/plpython/plpy_procedure.h
index 3ef22844a9b..4527b783897 100644
--- a/src/pl/plpython/plpy_procedure.h
+++ b/src/pl/plpython/plpy_procedure.h
@@ -6,9 +6,7 @@
 #define PLPY_PROCEDURE_H
 
 #include "plpy_typeio.h"
-
-
-extern void init_procedure_caches(void);
+#include "utils/funccache.h"
 
 
 /*
@@ -31,15 +29,28 @@ typedef struct PLySavedArgs
 	PyObject   *namedargs[FLEXIBLE_ARRAY_MEMBER];	/* named args */
 } PLySavedArgs;
 
-/* cached procedure data */
+/* saved state for a set-returning function */
+typedef struct PLySRFState
+{
+	PyObject   *iter;			/* Python iterator producing results */
+	PLySavedArgs *savedargs;	/* function argument values */
+} PLySRFState;
+
+/*
+ * Long-lived data for a PL/Python function.
+ *
+ * This struct is managed by funccache.c and can be shared across multiple
+ * executions of the same function.  It must contain no execution-specific
+ * state.  The CachedFunction struct must be first so we can cast between them.
+ */
 typedef struct PLyProcedure
 {
+	CachedFunction cfunc;		/* fields managed by funccache.c */
+
 	MemoryContext mcxt;			/* context holding this PLyProcedure and its
 								 * subsidiary data */
 	char	   *proname;		/* SQL name of procedure */
 	char	   *pyname;			/* Python name of procedure */
-	TransactionId fn_xmin;
-	ItemPointerData fn_tid;
 	bool		fn_readonly;
 	bool		is_setof;		/* true, if function returns result set */
 	bool		is_procedure;
@@ -59,23 +70,27 @@ typedef struct PLyProcedure
 	PLySavedArgs *argstack;		/* stack of outer-level call arguments */
 } PLyProcedure;
 
-/* the procedure cache key */
-typedef struct PLyProcedureKey
+/*
+ * Per-call-site cache for a PL/Python function.
+ *
+ * This struct is stored in fn_extra and holds execution-specific state,
+ * including a pointer to the long-lived PLyProcedure.  The use_count in
+ * the PLyProcedure is incremented while we hold a reference.
+ */
+typedef struct PLyProcedureCache
 {
-	Oid			fn_oid;			/* function OID */
-	Oid			fn_rel;			/* triggered-on relation or InvalidOid */
-} PLyProcedureKey;
+	PLyProcedure *proc;			/* long-lived hash entry */
+	MemoryContext fcontext;		/* fn_mcxt - context holding this struct */
+	PLySRFState *srfstate;		/* SRF execution state, NULL if not in SRF */
+	bool		shutdown_reg;	/* true if registered shutdown callback */
 
-/* the procedure cache entry */
-typedef struct PLyProcedureEntry
-{
-	PLyProcedureKey key;		/* hash key */
-	PLyProcedure *proc;
-} PLyProcedureEntry;
+	/* Callback to release use-count when fcontext is deleted */
+	MemoryContextCallback mcb;
+} PLyProcedureCache;
 
 /* PLyProcedure manipulation */
 extern char *PLy_procedure_name(PLyProcedure *proc);
-extern PLyProcedure *PLy_procedure_get(Oid fn_oid, Oid fn_rel, PLyTrigType is_trigger);
+extern PLyProcedureCache *PLy_procedure_get(FunctionCallInfo fcinfo, bool forValidator);
 extern void PLy_procedure_compile(PLyProcedure *proc, const char *src);
 extern void PLy_procedure_delete(PLyProcedure *proc);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cf40c87043..636c8b27fe7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2074,6 +2074,7 @@ PLyObToTuple
 PLyObject_AsString_t
 PLyPlanObject
 PLyProcedure
+PLyProcedureCache
 PLyProcedureEntry
 PLyProcedureKey
 PLyResultObject
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-05-28 15:12   ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  2026-06-01 22:14     ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-06-01 23:26       ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  2026-06-05 18:09         ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
@ 2026-06-05 19:11           ` Tom Lane <[email protected]>
  2026-06-05 19:35             ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Tom Lane @ 2026-06-05 19:11 UTC (permalink / raw)
  To: Matheus Alcantara <[email protected]>; +Cc: [email protected]; [email protected]; [email protected]

"Matheus Alcantara" <[email protected]> writes:
> On Mon Jun 1, 2026 at 8:26 PM -03, Tom Lane wrote:
>> Actually ... if memory serves, SQL-language functions use ValuePerCall
>> mode, so there probably already is a solution to this embedded in
>> functions.c.  Did you look at that?

> I dind't look at this before but this was exactly the right call. SQL
> functions handle this by maintaining a per-call-site cache struct
> (SQLFunctionCache) in fn_extra that holds both the pointer to the
> long-lived hash entry and the execution state. The use_count is
> incremented when we first obtain the function and decremented via a
> MemoryContextCallback when fn_mcxt is deleted.

> I've adapted the same approach for PL/Python.

I've not read this patch yet but your high-level description seems
on-target.

Assuming the patch withstands review, there are three ways we could
proceed:

1. Hold it for v20.

2. Sneak it into v19.

3. Treat it as a back-patchable fix and put it into v18 as well.
(Going further back than v18 seems unreasonable because funccache.c
doesn't exist before that, so we'd have to back-patch it too.)

I do not think that #3 is really a great idea, mainly because the
failure case doesn't seem very likely to be hit in production,
and the lack of previous reports about this very ancient bug
bears that out.

I do find some attraction in #2, mainly because it would get the fix
into the field a year earlier than #1.  But considering we're past
beta1 it may be too late for #2 to be reasonable either.

Looping in the RMT to see what they think...

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
  2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
  2026-05-25 22:26 ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-05-28 15:12   ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  2026-06-01 22:14     ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-06-01 23:26       ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
  2026-06-05 18:09         ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Matheus Alcantara <[email protected]>
  2026-06-05 19:11           ` Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct Tom Lane <[email protected]>
@ 2026-06-05 19:35             ` Matheus Alcantara <[email protected]>
  0 siblings, 0 replies; 9+ messages in thread

From: Matheus Alcantara @ 2026-06-05 19:35 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]; [email protected]; [email protected]

On 05/06/26 16:11, Tom Lane wrote:
> "Matheus Alcantara" <[email protected]> writes:
>> On Mon Jun 1, 2026 at 8:26 PM -03, Tom Lane wrote:
>>> Actually ... if memory serves, SQL-language functions use ValuePerCall
>>> mode, so there probably already is a solution to this embedded in
>>> functions.c.  Did you look at that?
> 
>> I dind't look at this before but this was exactly the right call. SQL
>> functions handle this by maintaining a per-call-site cache struct
>> (SQLFunctionCache) in fn_extra that holds both the pointer to the
>> long-lived hash entry and the execution state. The use_count is
>> incremented when we first obtain the function and decremented via a
>> MemoryContextCallback when fn_mcxt is deleted.
> 
>> I've adapted the same approach for PL/Python.
> 
> I've not read this patch yet but your high-level description seems
> on-target.
> 
> Assuming the patch withstands review, there are three ways we could
> proceed:
> 
> 1. Hold it for v20.
> 
> 2. Sneak it into v19.
> 
> 3. Treat it as a back-patchable fix and put it into v18 as well.
> (Going further back than v18 seems unreasonable because funccache.c
> doesn't exist before that, so we'd have to back-patch it too.)
> 
> I do not think that #3 is really a great idea, mainly because the
> failure case doesn't seem very likely to be hit in production,
> and the lack of previous reports about this very ancient bug
> bears that out.
> 
> I do find some attraction in #2, mainly because it would get the fix
> into the field a year earlier than #1.  But considering we're past
> beta1 it may be too late for #2 to be reasonable either.
> 

Yeah, this sounds a better option for me too, otherwise we can go with 
#1. Back-patching this seems complicated, so I agree #3 does not seems 
a good idea.

> Looping in the RMT to see what they think...
> 

Ok

--
Matheus Alcantara
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 9+ messages in thread


end of thread, other threads:[~2026-06-05 19:35 UTC | newest]

Thread overview: 9+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-15 11:11 BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct PG Bug reporting form <[email protected]>
2026-05-25 22:26 ` Matheus Alcantara <[email protected]>
2026-05-28 14:10   ` Matheus Alcantara <[email protected]>
2026-05-28 15:12   ` Tom Lane <[email protected]>
2026-06-01 22:14     ` Matheus Alcantara <[email protected]>
2026-06-01 23:26       ` Tom Lane <[email protected]>
2026-06-05 18:09         ` Matheus Alcantara <[email protected]>
2026-06-05 19:11           ` Tom Lane <[email protected]>
2026-06-05 19:35             ` Matheus Alcantara <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox