public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Using Expanded Objects other than Arrays from plpgsql
34+ messages / 4 participants
[nested] [flat]

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-19 19:45  Tom Lane <[email protected]>
  0 siblings, 3 replies; 34+ messages in thread

From: Tom Lane @ 2024-11-19 19:45 UTC (permalink / raw)
  To: Pavel Stehule <[email protected]>; +Cc: Michel Pelletier <[email protected]>; [email protected]

Pavel Stehule <[email protected]> writes:
> út 19. 11. 2024 v 18:51 odesílatel Michel Pelletier <
> [email protected]> napsal:
>> A couple years ago I tried to compress what I learned about expanded
>> objects into a dummy extension that just provides the necessary
>> boilerplate.  It wasn't great but a start:
>> https://github.com/michelp/pgexpanded
>> Pavel Stehule indicated this might be a good example to put into contrib:

> another position can be src/test/modules - I think so your example is
> "similar" to plsample

Yeah.  I think we've largely adopted the position that contrib should
contain installable modules that do something potentially useful to
end-users.  A pure skeleton wouldn't be that, but if it's fleshed out
enough to be test code for some core features then src/test/modules
could be a reasonable home.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-19 20:52  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  2 siblings, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2024-11-19 20:52 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Tue, Nov 19, 2024 at 11:45 AM Tom Lane <[email protected]> wrote:

> Pavel Stehule <[email protected]> writes:
> > út 19. 11. 2024 v 18:51 odesílatel Michel Pelletier <
> > [email protected]> napsal:
> >> A couple years ago I tried to compress what I learned about expanded
> >> objects into a dummy extension that just provides the necessary
> >> boilerplate.  It wasn't great but a start:
> >> https://github.com/michelp/pgexpanded
> >> Pavel Stehule indicated this might be a good example to put into
> contrib:
>
> > another position can be src/test/modules - I think so your example is
> > "similar" to plsample
>
> Yeah.  I think we've largely adopted the position that contrib should
> contain installable modules that do something potentially useful to
> end-users.  A pure skeleton wouldn't be that, but if it's fleshed out
> enough to be test code for some core features then src/test/modules
> could be a reasonable home.
>

Great!  I'll put a patch together that adds the skeleton object to
src/test/modules and I'll write some expected tests that run the expansion
through its paces, when the support function feature happens I'll update it
to include tests for that.

Should I include Tom's patch changes on top of mine or keep those
separate?  I'm not entirely clear on the best practice to carry those
forward as well.

-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-25 02:02  Michel Pelletier <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 2 replies; 34+ messages in thread

From: Michel Pelletier @ 2024-11-25 02:02 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Tue, Nov 19, 2024 at 12:52 PM Michel Pelletier <
[email protected]> wrote:

> On Tue, Nov 19, 2024 at 11:45 AM Tom Lane <[email protected]> wrote:
>
>> Pavel Stehule <[email protected]> writes:
>> > another position can be src/test/modules - I think so your example is
>> > "similar" to plsample
>>
>> Yeah.  I think we've largely adopted the position that contrib should
>> contain installable modules that do something potentially useful to
>> end-users.  A pure skeleton wouldn't be that, but if it's fleshed out
>> enough to be test code for some core features then src/test/modules
>> could be a reasonable home.
>>
>
> Great!  I'll put a patch together that adds the skeleton object to
> src/test/modules and I'll write some expected tests that run the expansion
> through its paces, when the support function feature happens I'll update it
> to include tests for that.
>

Here's a WIP patch for a pgexpanded example in src/test/modules.  The
object is very simple and starts with an integer and increments that value
every time it is expanded.  I added some regression tests that test two sql
functions that replicate the expansion issue that I'm seeing with my
extension.

I considered a more complex data type like a linked list, something that
could maybe also showcase subscripting support for expanded objects, but I
didn't want to go too far without discussion.

-Michel

>


Attachments:

  [text/x-patch] 0001-Add-example-test-module-for-expanded-objects.patch (16.1K, 3-0001-Add-example-test-module-for-expanded-objects.patch)
  download | inline diff:
From bff879a76a16d28bba4a3859ad11650a642917d2 Mon Sep 17 00:00:00 2001
From: Michel Pelletier <[email protected]>
Date: Sat, 23 Nov 2024 21:01:28 -0800
Subject: [PATCH] Add example test module for expanded objects.

This is a very simple template for creating expanded objects
that keeps track of the number of times it has been expanded.

Future feature support for expanded objects should showecase
those features here.

Discussion: https://www.postgresql.org/message-id/CACxu%3DvJNMj1MqqUiwATuazoewireaN%3D7nskD5V-CBEcrs_K6Vg%40mail.gmail.com
---
 src/test/modules/pgexpanded/.gitignore        |   3 +
 src/test/modules/pgexpanded/Makefile          |  21 +++
 src/test/modules/pgexpanded/README.md         |  21 +++
 .../pgexpanded/expected/pgexpanded.out        |  92 +++++++++
 .../modules/pgexpanded/pgexpanded--1.0.sql    |  48 +++++
 src/test/modules/pgexpanded/pgexpanded.c      | 177 ++++++++++++++++++
 .../modules/pgexpanded/pgexpanded.control     |   5 +
 src/test/modules/pgexpanded/pgexpanded.h      |  87 +++++++++
 .../modules/pgexpanded/sql/pgexpanded.sql     |   5 +
 9 files changed, 459 insertions(+)
 create mode 100644 src/test/modules/pgexpanded/.gitignore
 create mode 100644 src/test/modules/pgexpanded/Makefile
 create mode 100644 src/test/modules/pgexpanded/README.md
 create mode 100644 src/test/modules/pgexpanded/expected/pgexpanded.out
 create mode 100644 src/test/modules/pgexpanded/pgexpanded--1.0.sql
 create mode 100644 src/test/modules/pgexpanded/pgexpanded.c
 create mode 100644 src/test/modules/pgexpanded/pgexpanded.control
 create mode 100644 src/test/modules/pgexpanded/pgexpanded.h
 create mode 100644 src/test/modules/pgexpanded/sql/pgexpanded.sql

diff --git a/src/test/modules/pgexpanded/.gitignore b/src/test/modules/pgexpanded/.gitignore
new file mode 100644
index 0000000000..44d119cfcc
--- /dev/null
+++ b/src/test/modules/pgexpanded/.gitignore
@@ -0,0 +1,3 @@
+# Generated subdirectories
+/log/
+/results/
diff --git a/src/test/modules/pgexpanded/Makefile b/src/test/modules/pgexpanded/Makefile
new file mode 100644
index 0000000000..27aa55ddaa
--- /dev/null
+++ b/src/test/modules/pgexpanded/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/pgexpanded/Makefile
+
+MODULES = pgexpanded
+
+EXTENSION = pgexpanded
+DATA = pgexpanded--1.0.sql
+PGFILEDESC = "pgexpanded - template for expanded datum"
+
+REGRESS = pgexpanded
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/pgexpanded
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
diff --git a/src/test/modules/pgexpanded/README.md b/src/test/modules/pgexpanded/README.md
new file mode 100644
index 0000000000..20b11bc690
--- /dev/null
+++ b/src/test/modules/pgexpanded/README.md
@@ -0,0 +1,21 @@
+# pgexpanded
+
+This is an example postgres extension that shows how to implement an
+"expanded" data type in C as described [in this
+documentation](https://www.postgresql.org/docs/current/xtypes.html):
+
+*"Another feature that's enabled by TOAST support is the possibility of
+having an expanded in-memory data representation that is more
+convenient to work with than the format that is stored on disk. The
+regular or “flat” varlena storage format is ultimately just a blob of
+bytes; it cannot for example contain pointers, since it may get copied
+to other locations in memory. For complex data types, the flat format
+may be quite expensive to work with, so PostgreSQL provides a way to
+“expand” the flat format into a representation that is more suited to
+computation, and then pass that format in-memory between functions of
+the data type."*
+
+This repository provides a simple, compilable and runnable example
+expanded data type that can be used as a basis for other extensions.
+By way of trivial example, it shows how to expand a data type that
+keeps track of the number of expansions it's gone through.
diff --git a/src/test/modules/pgexpanded/expected/pgexpanded.out b/src/test/modules/pgexpanded/expected/pgexpanded.out
new file mode 100644
index 0000000000..748dbc7f36
--- /dev/null
+++ b/src/test/modules/pgexpanded/expected/pgexpanded.out
@@ -0,0 +1,92 @@
+SET client_min_messages = 'debug1';
+CREATE EXTENSION pgexpanded;
+DEBUG:  executing extension script for "pgexpanded" version '1.0'
+SELECT '0'::exobj;
+DEBUG:  exobj_in
+LINE 1: SELECT '0'::exobj;
+               ^
+DEBUG:  new_expanded_exobj
+LINE 1: SELECT '0'::exobj;
+               ^
+DEBUG:  exobj_get_flat_size
+LINE 1: SELECT '0'::exobj;
+               ^
+DEBUG:  exobj_flatten_into
+LINE 1: SELECT '0'::exobj;
+               ^
+DEBUG:  DatumGetExobj
+DEBUG:  new_expanded_exobj
+DEBUG:  exobj_out
+DEBUG:  context_callback_exobj_free
+DEBUG:  context_callback_exobj_free
+ exobj 
+-------
+ 2
+(1 row)
+
+SELECT test_expand('0'::exobj);
+DEBUG:  exobj_in
+LINE 1: SELECT test_expand('0'::exobj);
+                           ^
+DEBUG:  new_expanded_exobj
+LINE 1: SELECT test_expand('0'::exobj);
+                           ^
+DEBUG:  exobj_get_flat_size
+LINE 1: SELECT test_expand('0'::exobj);
+                           ^
+DEBUG:  exobj_flatten_into
+LINE 1: SELECT test_expand('0'::exobj);
+                           ^
+DEBUG:  exobj_info
+DEBUG:  DatumGetExobj
+DEBUG:  new_expanded_exobj
+DEBUG:  context_callback_exobj_free
+NOTICE:  expand count 2
+DEBUG:  DatumGetExobj
+DEBUG:  new_expanded_exobj
+DEBUG:  exobj_out
+DEBUG:  context_callback_exobj_free
+DEBUG:  context_callback_exobj_free
+ test_expand 
+-------------
+ 2
+(1 row)
+
+SELECT test_expand_expand('0'::exobj);
+DEBUG:  exobj_in
+LINE 1: SELECT test_expand_expand('0'::exobj);
+                                  ^
+DEBUG:  new_expanded_exobj
+LINE 1: SELECT test_expand_expand('0'::exobj);
+                                  ^
+DEBUG:  exobj_get_flat_size
+LINE 1: SELECT test_expand_expand('0'::exobj);
+                                  ^
+DEBUG:  exobj_flatten_into
+LINE 1: SELECT test_expand_expand('0'::exobj);
+                                  ^
+DEBUG:  exobj_info
+DEBUG:  DatumGetExobj
+DEBUG:  new_expanded_exobj
+DEBUG:  context_callback_exobj_free
+NOTICE:  expand expand count 2
+DEBUG:  exobj_info
+DEBUG:  DatumGetExobj
+DEBUG:  new_expanded_exobj
+DEBUG:  context_callback_exobj_free
+NOTICE:  expand count 2
+DEBUG:  exobj_info
+DEBUG:  DatumGetExobj
+DEBUG:  new_expanded_exobj
+DEBUG:  context_callback_exobj_free
+NOTICE:  expand count 2
+DEBUG:  DatumGetExobj
+DEBUG:  new_expanded_exobj
+DEBUG:  exobj_out
+DEBUG:  context_callback_exobj_free
+DEBUG:  context_callback_exobj_free
+ test_expand_expand 
+--------------------
+ 2
+(1 row)
+
diff --git a/src/test/modules/pgexpanded/pgexpanded--1.0.sql b/src/test/modules/pgexpanded/pgexpanded--1.0.sql
new file mode 100644
index 0000000000..7f5d7e75ad
--- /dev/null
+++ b/src/test/modules/pgexpanded/pgexpanded--1.0.sql
@@ -0,0 +1,48 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pgexpanded" to load this file. \quit
+
+CREATE TYPE exobj;
+
+CREATE FUNCTION exobj_in(cstring)
+RETURNS exobj
+AS '$libdir/pgexpanded', 'exobj_in'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION exobj_out(exobj)
+RETURNS cstring
+AS '$libdir/pgexpanded', 'exobj_out'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE TYPE exobj (
+    input = exobj_in,
+    output = exobj_out,
+    alignment = int4,
+    storage = 'extended',
+    internallength = -1
+);
+
+CREATE FUNCTION info(exobj)
+RETURNS bigint
+AS '$libdir/pgexpanded', 'exobj_info'
+LANGUAGE C STABLE;
+
+create or replace function test_expand(obj exobj) returns exobj language plpgsql as
+    $$
+    declare
+        i bigint = info(obj);
+    begin
+        raise notice 'expand count %', i;
+        return obj;
+    end;
+    $$;
+
+create or replace function test_expand_expand(obj exobj) returns exobj language plpgsql as
+    $$
+    declare
+        i bigint = info(obj);
+    begin
+        raise notice 'expand expand count %', i;
+        obj = test_expand(obj);
+        return test_expand(obj);
+    end;
+    $$;
diff --git a/src/test/modules/pgexpanded/pgexpanded.c b/src/test/modules/pgexpanded/pgexpanded.c
new file mode 100644
index 0000000000..562a1f2860
--- /dev/null
+++ b/src/test/modules/pgexpanded/pgexpanded.c
@@ -0,0 +1,177 @@
+#include "pgexpanded.h"
+PG_MODULE_MAGIC;
+
+/* Compute flattened size of storage needed for a exobj */
+static Size
+exobj_get_flat_size(ExpandedObjectHeader *eohptr) {
+	pgexpanded_Exobj *A = (pgexpanded_Exobj*) eohptr;
+	Size nbytes;
+
+	LOGF();
+
+	/* This is a sanity check that the object is initialized */
+	Assert(A->em_magic == exobj_MAGIC);
+
+	/* Use cached value if already computed */
+	if (A->flat_size) {
+		return A->flat_size;
+	}
+
+	// Add the overhead of the flat header to the size of the data
+	// payload
+	nbytes = PGEXPANDED_EXOBJ_OVERHEAD();
+	nbytes += sizeof(uint64_t);
+
+	/* Cache this value in the expanded object */
+	A->flat_size = nbytes;
+	return nbytes;
+}
+
+/* Flatten exobj into a pre-allocated result buffer that is
+   allocated_size in bytes.  */
+static void
+exobj_flatten_into(ExpandedObjectHeader *eohptr,
+				   void *result, Size allocated_size)  {
+	void *data;
+
+	/* Cast EOH pointer to expanded object, and result pointer to flat
+	   object */
+	pgexpanded_Exobj *A = (pgexpanded_Exobj *) eohptr;
+	pgexpanded_FlatExobj *flat = (pgexpanded_FlatExobj *) result;
+
+	LOGF();
+
+	/* Sanity check the object is valid */
+	Assert(A->em_magic == exobj_MAGIC);
+	Assert(allocated_size == A->flat_size);
+
+	/* Zero out the whole allocated buffer */
+	memset(flat, 0, allocated_size);
+
+	/* Get the pointer to the start of the flattened data and copy the
+	   expanded value into it */
+	data = PGEXPANDED_EXOBJ_DATA(flat);
+	memcpy(data, A->value, sizeof(int64_t));
+
+	/* Set the size of the varlena object */
+	SET_VARSIZE(flat, allocated_size);
+}
+
+/* Expand a flat exobj in to an Expanded one, return as Postgres Datum. */
+pgexpanded_Exobj *
+new_expanded_exobj(int64_t value, MemoryContext parentcontext) {
+	pgexpanded_Exobj *A;
+
+	MemoryContext objcxt, oldcxt;
+	MemoryContextCallback *ctxcb;
+
+	LOGF();
+
+	/* Create a new context that will hold the expanded object. */
+	objcxt = AllocSetContextCreate(parentcontext,
+								   "expanded exobj",
+								   ALLOCSET_DEFAULT_SIZES);
+
+	/* Allocate a new expanded exobj */
+	A = (pgexpanded_Exobj*)MemoryContextAlloc(objcxt,
+											  sizeof(pgexpanded_Exobj));
+
+	/* Initialize the ExpandedObjectHeader member with flattening
+	 * methods and the new object context */
+	EOH_init_header(&A->hdr, &exobj_methods, objcxt);
+
+	/* Used for debugging checks */
+	A->em_magic = exobj_MAGIC;
+
+	/* Switch to new object context */
+	oldcxt = MemoryContextSwitchTo(objcxt);
+
+	/* Get value from flat object and increment it */
+	A->value = palloc(sizeof(int64_t));
+	*(A->value) = value + 1;
+
+	/* Setting flat size to zero tells us the object has been written. */
+	A->flat_size = 0;
+
+	/* Create a context callback to free exobj when context is cleared */
+	ctxcb = MemoryContextAlloc(objcxt, sizeof(MemoryContextCallback));
+
+	ctxcb->func = context_callback_exobj_free;
+	ctxcb->arg = A;
+	MemoryContextRegisterResetCallback(objcxt, ctxcb);
+
+	/* Switch back to old context */
+	MemoryContextSwitchTo(oldcxt);
+	return A;
+}
+
+/* MemoryContextCallback function to free exobj data when their
+   context goes out of scope. */
+static void
+context_callback_exobj_free(void* ptr) {
+	pgexpanded_Exobj *A = (pgexpanded_Exobj *) ptr;
+	LOGF();
+	pfree(A->value);
+}
+
+/* Helper function to always expanded datum
+
+   This is used by PG_GETARG_EXOBJ */
+pgexpanded_Exobj *
+DatumGetExobj(Datum d) {
+	pgexpanded_Exobj *A;
+	pgexpanded_FlatExobj *flat;
+	int64_t *value;
+
+	LOGF();
+	if (VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(d))) {
+		A = ExobjGetEOHP(d);
+		Assert(A->em_magic == exobj_MAGIC);
+		return A;
+	}
+	flat = (pgexpanded_FlatExobj*)PG_DETOAST_DATUM(d);
+	value = PGEXPANDED_EXOBJ_DATA(flat);
+	A = new_expanded_exobj(*value, CurrentMemoryContext);
+	return A;
+}
+
+Datum
+exobj_in(PG_FUNCTION_ARGS) {
+	char *input;
+	pgexpanded_Exobj *result;
+	int64_t value;
+	LOGF();
+	input = PG_GETARG_CSTRING(0);
+	value = strtoll(input, NULL, 10);
+	result = new_expanded_exobj(value, CurrentMemoryContext);
+	PGEXPANDED_RETURN_EXOBJ(result);
+}
+
+Datum
+exobj_out(PG_FUNCTION_ARGS)
+{
+	char *result;
+	pgexpanded_Exobj *A = PGEXPANDED_GETARG_EXOBJ(0);
+	LOGF();
+	result = palloc(32);
+	snprintf(result, sizeof(result), "%lld", (long long int) *A->value);
+	PG_RETURN_CSTRING(result);
+}
+
+Datum
+exobj_info(PG_FUNCTION_ARGS) {
+	pgexpanded_Exobj *A;
+	LOGF();
+	A = PGEXPANDED_GETARG_EXOBJ(0);
+	return Int64GetDatum(*A->value);
+}
+
+void
+_PG_init(void)
+{
+	LOGF();
+}
+/* Local Variables: */
+/* mode: c */
+/* c-file-style: "postgresql" */
+/* End: */
diff --git a/src/test/modules/pgexpanded/pgexpanded.control b/src/test/modules/pgexpanded/pgexpanded.control
new file mode 100644
index 0000000000..2e9d133484
--- /dev/null
+++ b/src/test/modules/pgexpanded/pgexpanded.control
@@ -0,0 +1,5 @@
+# pgexpanded extension
+comment = 'Example Postgres extension for expanded data types.'
+default_version = '1.0'
+relocatable = true
+requires = ''
diff --git a/src/test/modules/pgexpanded/pgexpanded.h b/src/test/modules/pgexpanded/pgexpanded.h
new file mode 100644
index 0000000000..4adebace5f
--- /dev/null
+++ b/src/test/modules/pgexpanded/pgexpanded.h
@@ -0,0 +1,87 @@
+#ifndef PGEXPANDED_H
+#define PGEXPANDED_H
+
+#include "postgres.h"
+#include "funcapi.h"
+#include "utils/expandeddatum.h"
+
+/* ID for debugging crosschecks */
+#define exobj_MAGIC 689276813
+
+#define LOGF() elog(DEBUG1, __func__)
+
+/* Flattened representation of exobj, used to store to disk.
+
+   The first 32 bits must the length of the data.  Actual flattened data
+   is appended after this struct and cannot exceed 1GB.
+*/
+typedef struct pgexpanded_FlatExobj {
+	int32 vl_len_;
+} pgexpanded_FlatExobj;
+
+/* Expanded representation of exobj.
+
+   When loaded from storage, the flattened representation is used to
+   build the exobj.  In this case, it's just a pointer to an integer.
+*/
+typedef struct pgexpanded_Exobj  {
+	ExpandedObjectHeader hdr;
+	int em_magic;
+	Size flat_size;
+	int64_t *value;
+} pgexpanded_Exobj;
+
+/* Callback function for freeing exobj arrays. */
+static void
+context_callback_exobj_free(void*);
+
+/* Expanded Object Header "methods" for flattening for storage */
+static Size
+exobj_get_flat_size(ExpandedObjectHeader *eohptr);
+
+static void
+exobj_flatten_into(ExpandedObjectHeader *eohptr,
+				   void *result, Size allocated_size);
+
+static const ExpandedObjectMethods exobj_methods = {
+	exobj_get_flat_size,
+	exobj_flatten_into
+};
+
+/* Create a new exobj datum. */
+pgexpanded_Exobj *
+new_expanded_exobj(int64_t value,  MemoryContext parentcontext);
+
+/* Helper function that either detoasts or expands. */
+pgexpanded_Exobj *DatumGetExobj(Datum d);
+
+/* Helper macro to detoast and expand exobjs arguments */
+#define PGEXPANDED_GETARG_EXOBJ(n)  DatumGetExobj(PG_GETARG_DATUM(n))
+
+/* Helper macro to return Expanded Object Header Pointer from exobj. */
+#define PGEXPANDED_RETURN_EXOBJ(A) return EOHPGetRWDatum(&(A)->hdr)
+
+/* Helper macro to compute flat exobj header size */
+#define PGEXPANDED_EXOBJ_OVERHEAD() MAXALIGN(sizeof(pgexpanded_FlatExobj))
+
+/* Helper macro to get pointer to beginning of exobj data. */
+#define PGEXPANDED_EXOBJ_DATA(a) ((int64_t *)(((char *) (a)) + PGEXPANDED_EXOBJ_OVERHEAD()))
+
+/* Help macro to cast generic Datum header pointer to expanded Exobj */
+#define ExobjGetEOHP(d) (pgexpanded_Exobj *) DatumGetEOHP(d);
+
+/* Public API functions */
+
+PG_FUNCTION_INFO_V1(exobj);
+PG_FUNCTION_INFO_V1(exobj_in);
+PG_FUNCTION_INFO_V1(exobj_out);
+PG_FUNCTION_INFO_V1(exobj_info);
+
+void
+_PG_init(void);
+
+#endif /* PGEXPANDED_H */
+/* Local Variables: */
+/* mode: c */
+/* c-file-style: "postgresql" */
+/* End: */
diff --git a/src/test/modules/pgexpanded/sql/pgexpanded.sql b/src/test/modules/pgexpanded/sql/pgexpanded.sql
new file mode 100644
index 0000000000..8858851b32
--- /dev/null
+++ b/src/test/modules/pgexpanded/sql/pgexpanded.sql
@@ -0,0 +1,5 @@
+SET client_min_messages = 'debug1';
+CREATE EXTENSION pgexpanded;
+SELECT '0'::exobj;
+SELECT test_expand('0'::exobj);
+SELECT test_expand_expand('0'::exobj);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-04 00:42  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  1 sibling, 1 reply; 34+ messages in thread

From: Tom Lane @ 2024-12-04 00:42 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

Michel Pelletier <[email protected]> writes:
> Here's a WIP patch for a pgexpanded example in src/test/modules.

I didn't look at your patch yet, but in the meantime here's an update
that takes the next step towards what I promised.

0001-0003 are the same as before, with a couple of trivial changes
to rebase them up to current HEAD.  0004 adds a support function
request to allow extension functions to perform in-place updates.
You should be able to use that to improve what your extension
is doing.  The new comments in supportnodes.h explain how to
use it (plus see the built-in examples, though they are quite
simple).

			regards, tom lane



Attachments:

  [text/x-diff] v2-0001-Preliminary-refactoring.patch (9.5K, 2-v2-0001-Preliminary-refactoring.patch)
  download | inline diff:
From 2b5c421f608d77994c1a898c9860b8a28bc2f46f Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Tue, 3 Dec 2024 14:31:04 -0500
Subject: [PATCH v2 1/4] Preliminary refactoring.

This short and boring patch simply moves the responsibility for
initializing PLpgSQL_expr.target_param into plpgsql parsing,
rather than doing it at first execution of the expr as before.
This doesn't save anything in terms of runtime, since the work was
trivial and done only once per expr anyway.  But it makes the info
available during parsing, which will be useful for the next step.

Likewise set PLpgSQL_expr.func during parsing.  According to the
comments, this was once impossible; but it's certainly possible
since we invented the plpgsql_curr_compile variable.  Again, this
saves little runtime, but it seems far cleaner conceptually.

While at it, I reordered stuff in struct PLpgSQL_expr to make it
clearer which fields are filled when, and merged some duplicative
code in pl_gram.y.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_exec.c | 27 ---------------
 src/pl/plpgsql/src/pl_gram.y | 65 ++++++++++++++++++++++++------------
 src/pl/plpgsql/src/plpgsql.h | 31 +++++++++--------
 3 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index e31206e7f4..1a9c010205 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4174,12 +4174,6 @@ exec_prepare_plan(PLpgSQL_execstate *estate,
 	SPIPlanPtr	plan;
 	SPIPrepareOptions options;
 
-	/*
-	 * The grammar can't conveniently set expr->func while building the parse
-	 * tree, so make sure it's set before parser hooks need it.
-	 */
-	expr->func = estate->func;
-
 	/*
 	 * Generate and save the plan
 	 */
@@ -5016,21 +5010,7 @@ exec_assign_expr(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 	 * If first time through, create a plan for this expression.
 	 */
 	if (expr->plan == NULL)
-	{
-		/*
-		 * Mark the expression as being an assignment source, if target is a
-		 * simple variable.  (This is a bit messy, but it seems cleaner than
-		 * modifying the API of exec_prepare_plan for the purpose.  We need to
-		 * stash the target dno into the expr anyway, so that it will be
-		 * available if we have to replan.)
-		 */
-		if (target->dtype == PLPGSQL_DTYPE_VAR)
-			expr->target_param = target->dno;
-		else
-			expr->target_param = -1;	/* should be that already */
-
 		exec_prepare_plan(estate, expr, 0);
-	}
 
 	value = exec_eval_expr(estate, expr, &isnull, &valtype, &valtypmod);
 	exec_assign_value(estate, target, value, isnull, valtype, valtypmod);
@@ -6282,13 +6262,6 @@ setup_param_list(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 		 * that they are interrupting an active use of parameters.
 		 */
 		paramLI->parserSetupArg = expr;
-
-		/*
-		 * Also make sure this is set before parser hooks need it.  There is
-		 * no need to save and restore, since the value is always correct once
-		 * set.  (Should be set already, but let's be sure.)
-		 */
-		expr->func = estate->func;
 	}
 	else
 	{
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 8182ce28aa..5431977d69 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -64,6 +64,10 @@ static	bool			tok_is_keyword(int token, union YYSTYPE *lval,
 static	void			word_is_not_variable(PLword *word, int location);
 static	void			cword_is_not_variable(PLcword *cword, int location);
 static	void			current_token_is_not_variable(int tok);
+static	PLpgSQL_expr	*make_plpgsql_expr(const char *query,
+										   RawParseMode parsemode);
+static	void			expr_is_assignment_source(PLpgSQL_expr *expr,
+												  PLpgSQL_datum *target);
 static	PLpgSQL_expr	*read_sql_construct(int until,
 											int until2,
 											int until3,
@@ -529,6 +533,10 @@ decl_statement	: decl_varname decl_const decl_datatype decl_collate decl_notnull
 									 errmsg("variable \"%s\" must have a default value, since it's declared NOT NULL",
 											var->refname),
 									 parser_errposition(@5)));
+
+						if (var->default_val != NULL)
+							expr_is_assignment_source(var->default_val,
+													  (PLpgSQL_datum *) var);
 					}
 				| decl_varname K_ALIAS K_FOR decl_aliasitem ';'
 					{
@@ -987,6 +995,7 @@ stmt_assign		: T_DATUM
 													   pmode,
 													   false, true,
 													   NULL, NULL);
+						expr_is_assignment_source(new->expr, $1.datum);
 
 						$$ = (PLpgSQL_stmt *) new;
 					}
@@ -2637,6 +2646,38 @@ current_token_is_not_variable(int tok)
 		yyerror("syntax error");
 }
 
+/* Convenience routine to construct a PLpgSQL_expr struct */
+static PLpgSQL_expr *
+make_plpgsql_expr(const char *query,
+				  RawParseMode parsemode)
+{
+	PLpgSQL_expr *expr = palloc0(sizeof(PLpgSQL_expr));
+
+	expr->query = pstrdup(query);
+	expr->parseMode = parsemode;
+	expr->func = plpgsql_curr_compile;
+	expr->ns = plpgsql_ns_top();
+	/* might get changed later during parsing: */
+	expr->target_param = -1;
+	/* other fields are left as zeroes until first execution */
+	return expr;
+}
+
+/* Mark a PLpgSQL_expr as being the source of an assignment to target */
+static void
+expr_is_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
+{
+	/*
+	 * Mark the expression as being an assignment source, if target is a
+	 * simple variable.  We don't currently support optimized assignments to
+	 * other DTYPEs.
+	 */
+	if (target->dtype == PLPGSQL_DTYPE_VAR)
+		expr->target_param = target->dno;
+	else
+		expr->target_param = -1;	/* should be that already */
+}
+
 /* Convenience routine to read an expression with one possible terminator */
 static PLpgSQL_expr *
 read_sql_expression(int until, const char *expected)
@@ -2774,13 +2815,7 @@ read_sql_construct(int until,
 	 */
 	plpgsql_append_source_text(&ds, startlocation, endlocation);
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = parsemode;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, parsemode);
 	pfree(ds.data);
 
 	if (valid_sql)
@@ -3102,13 +3137,7 @@ make_execsql_stmt(int firsttoken, int location, PLword *word)
 	while (ds.len > 0 && scanner_isspace(ds.data[ds.len - 1]))
 		ds.data[--ds.len] = '\0';
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_DEFAULT;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_DEFAULT);
 	pfree(ds.data);
 
 	check_sql_expr(expr->query, expr->parseMode, location);
@@ -3980,13 +4009,7 @@ read_cursor_args(PLpgSQL_var *cursor, int until)
 			appendStringInfoString(&ds, ", ");
 	}
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_PLPGSQL_EXPR;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_PLPGSQL_EXPR);
 	pfree(ds.data);
 
 	/* Next we'd better find the until token */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 50c3b28472..fbb6000caa 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -219,14 +219,22 @@ typedef struct PLpgSQL_expr
 {
 	char	   *query;			/* query string, verbatim from function body */
 	RawParseMode parseMode;		/* raw_parser() mode to use */
-	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
-	Bitmapset  *paramnos;		/* all dnos referenced by this query */
+	struct PLpgSQL_function *func;	/* function containing this expr */
+	struct PLpgSQL_nsitem *ns;	/* namespace chain visible to this expr */
 
-	/* function containing this expr (not set until we first parse query) */
-	struct PLpgSQL_function *func;
+	/*
+	 * These fields are used to help optimize assignments to expanded-datum
+	 * variables.  If this expression is the source of an assignment to a
+	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 */
+	int			target_param;	/* dno of assign target, or -1 if none */
 
-	/* namespace chain visible to this expr */
-	struct PLpgSQL_nsitem *ns;
+	/*
+	 * Fields above are set during plpgsql parsing.  Remaining fields are left
+	 * as zeroes/NULLs until we first parse/plan the query.
+	 */
+	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
+	Bitmapset  *paramnos;		/* all dnos referenced by this query */
 
 	/* fields for "simple expression" fast-path execution: */
 	Expr	   *expr_simple_expr;	/* NULL means not a simple expr */
@@ -235,14 +243,11 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * These fields are used to optimize assignments to expanded-datum
-	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno; else it's -1.
-	 * If we match a Param within expr_simple_expr to such a variable, that
-	 * Param's address is stored in expr_rw_param; then expression code
-	 * generation will allow the value for that Param to be passed read/write.
+	 * If we match a Param within expr_simple_expr to the variable identified
+	 * by target_param, that Param's address is stored in expr_rw_param; then
+	 * expression code generation will allow the value for that Param to be
+	 * passed as a read/write expanded-object pointer.
 	 */
-	int			target_param;	/* dno of assign target, or -1 if none */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
-- 
2.43.5



  [text/x-diff] v2-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch (19.3K, 3-v2-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch)
  download | inline diff:
From 0e8a5e90ff546353391a57849413e802ec746153 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Tue, 3 Dec 2024 14:32:13 -0500
Subject: [PATCH v2 2/4] Detect whether plpgsql assignment targets are "local"
 variables.

Mark whether the target of a potentially optimizable assignment
is "local", in the sense of being declared inside any exception
block that could trap an error thrown from the assignment.
(This implies that we needn't preserve the variable's value
in case of an error.)

Normally, this requires a post-parsing scan of the function's
parse tree, since we don't know while parsing a BEGIN ...
construct whether we will find EXCEPTION at its end.  However,
if there are no BEGIN ... EXCEPTION blocks in the function at
all, then all assignments are local, even those to variables
representing function arguments.  We optimize that common case
by initializing the target_is_local flags to "true", and fixing
them up with a post-scan only if we found EXCEPTION.

The scan is implemented by code that's largely copied-and-pasted
from the nearby code to scan a plpgsql parse tree for deletion.
It's a bit annoying to have three copies of that now, but I'm
not seeing a way to refactor it that would save much code on net.

Note that variables' default-value expressions are never interesting
for expanded-variable optimization, since they couldn't contain a
reference to the target variable anyway.  But the code is set up
to compute their target_param and target_is_local correctly anyway,
for consistency and in case someone thinks of a use for that data.

I added a bit of plpgsql_dumptree support to help verify that
this code sets the flags as expected.  I'm not set on keeping
that, but I do want to keep the addition of a plpgsql_dumptree
call in plpgsql_compile_inline.  It's at best an oversight that
"#option dump" doesn't work in a DO block.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_comp.c  |  12 +
 src/pl/plpgsql/src/pl_funcs.c | 398 ++++++++++++++++++++++++++++++++++
 src/pl/plpgsql/src/pl_gram.y  |  15 ++
 src/pl/plpgsql/src/plpgsql.h  |   7 +-
 4 files changed, 431 insertions(+), 1 deletion(-)

diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 6255a86d75..ed9845d85a 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -365,6 +365,7 @@ do_compile(FunctionCallInfo fcinfo,
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	/*
 	 * Initialize the compiler, particularly the namespace stack.  The
@@ -806,6 +807,9 @@ do_compile(FunctionCallInfo fcinfo,
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
 	/* Debug dump for completed functions */
 	if (plpgsql_DumpExecTree)
 		plpgsql_dumptree(function);
@@ -899,6 +903,7 @@ plpgsql_compile_inline(char *proc_source)
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	plpgsql_ns_init();
 	plpgsql_ns_push(func_name, PLPGSQL_LABEL_BLOCK);
@@ -956,6 +961,13 @@ plpgsql_compile_inline(char *proc_source)
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
+	/* Debug dump for completed functions */
+	if (plpgsql_DumpExecTree)
+		plpgsql_dumptree(function);
+
 	/*
 	 * Pop the error context stack
 	 */
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index eeb7c4d7c0..889377fc9a 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -333,6 +333,401 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 }
 
 
+/**********************************************************************
+ * Mark assignment source expressions that have local target variables,
+ * that is, variables declared within the exception block most closely
+ * containing the assignment itself.  (Such target variables need not be
+ * preserved if the assignment's source expression raises an error,
+ * allowing better optimization.)
+ *
+ * This code need not be called if the plpgsql function contains no exception
+ * blocks, because expr_is_assignment_source() will have set all the flags
+ * to true already.  Also, we need not examine default-value expressions for
+ * variables, because variable declarations are necessarily within the nearest
+ * exception block.  (In DECLARE ... BEGIN ... EXCEPTION ... END, the variable
+ * initializations are done before entering the exception scope.)  So it's
+ * sufficient to find assignment statements.
+ *
+ * Within the recursion, local_dnos is a Bitmapset of dnos of variables
+ * known to be declared within the current exception level.
+ **********************************************************************/
+static void mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos);
+static void mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos);
+static void mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos);
+static void mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos);
+static void mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos);
+static void mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos);
+static void mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos);
+static void mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos);
+static void mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos);
+static void mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos);
+static void mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos);
+static void mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos);
+static void mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos);
+static void mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos);
+static void mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos);
+static void mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos);
+static void mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos);
+static void mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos);
+static void mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos);
+static void mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos);
+static void mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos);
+static void mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos);
+static void mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos);
+static void mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos);
+static void mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos);
+static void mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos);
+static void mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos);
+static void mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos);
+
+
+static void
+mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos)
+{
+	switch (stmt->cmd_type)
+	{
+		case PLPGSQL_STMT_BLOCK:
+			mark_block((PLpgSQL_stmt_block *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSIGN:
+			mark_assign((PLpgSQL_stmt_assign *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_IF:
+			mark_if((PLpgSQL_stmt_if *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CASE:
+			mark_case((PLpgSQL_stmt_case *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_LOOP:
+			mark_loop((PLpgSQL_stmt_loop *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_WHILE:
+			mark_while((PLpgSQL_stmt_while *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORI:
+			mark_fori((PLpgSQL_stmt_fori *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORS:
+			mark_fors((PLpgSQL_stmt_fors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORC:
+			mark_forc((PLpgSQL_stmt_forc *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FOREACH_A:
+			mark_foreach_a((PLpgSQL_stmt_foreach_a *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXIT:
+			mark_exit((PLpgSQL_stmt_exit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN:
+			mark_return((PLpgSQL_stmt_return *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_NEXT:
+			mark_return_next((PLpgSQL_stmt_return_next *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_QUERY:
+			mark_return_query((PLpgSQL_stmt_return_query *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RAISE:
+			mark_raise((PLpgSQL_stmt_raise *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSERT:
+			mark_assert((PLpgSQL_stmt_assert *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXECSQL:
+			mark_execsql((PLpgSQL_stmt_execsql *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNEXECUTE:
+			mark_dynexecute((PLpgSQL_stmt_dynexecute *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNFORS:
+			mark_dynfors((PLpgSQL_stmt_dynfors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_GETDIAG:
+			mark_getdiag((PLpgSQL_stmt_getdiag *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_OPEN:
+			mark_open((PLpgSQL_stmt_open *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FETCH:
+			mark_fetch((PLpgSQL_stmt_fetch *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CLOSE:
+			mark_close((PLpgSQL_stmt_close *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_PERFORM:
+			mark_perform((PLpgSQL_stmt_perform *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CALL:
+			mark_call((PLpgSQL_stmt_call *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_COMMIT:
+			mark_commit((PLpgSQL_stmt_commit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ROLLBACK:
+			mark_rollback((PLpgSQL_stmt_rollback *) stmt, local_dnos);
+			break;
+		default:
+			elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type);
+			break;
+	}
+}
+
+static void
+mark_stmts(List *stmts, Bitmapset *local_dnos)
+{
+	ListCell   *s;
+
+	foreach(s, stmts)
+	{
+		mark_stmt((PLpgSQL_stmt *) lfirst(s), local_dnos);
+	}
+}
+
+static void
+mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos)
+{
+	if (block->exceptions)
+	{
+		ListCell   *e;
+
+		/*
+		 * The block creates a new exception scope, so variables declared at
+		 * outer levels are nonlocal.  For that matter, so are any variables
+		 * declared in the block's DECLARE section.  Hence, we must pass down
+		 * empty local_dnos.
+		 */
+		mark_stmts(block->body, NULL);
+
+		foreach(e, block->exceptions->exc_list)
+		{
+			PLpgSQL_exception *exc = (PLpgSQL_exception *) lfirst(e);
+
+			mark_stmts(exc->action, NULL);
+		}
+	}
+	else
+	{
+		/*
+		 * Otherwise, the block does not create a new exception scope, and any
+		 * variables it declares can also be considered local within it.  Note
+		 * that only initializable datum types (VAR, REC) are included in
+		 * initvarnos; but that's sufficient for our purposes.
+		 */
+		local_dnos = bms_copy(local_dnos);
+		for (int i = 0; i < block->n_initvars; i++)
+			local_dnos = bms_add_member(local_dnos, block->initvarnos[i]);
+		mark_stmts(block->body, local_dnos);
+		bms_free(local_dnos);
+	}
+}
+
+static void
+mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos)
+{
+	PLpgSQL_expr *expr = stmt->expr;
+
+	/*
+	 * If the assignment target is a plain DTYPE_VAR datum, mark it as local
+	 * or not.  (If it's not a VAR, we don't care.)
+	 */
+	if (expr->target_param >= 0)
+		expr->target_is_local = bms_is_member(expr->target_param, local_dnos);
+}
+
+static void
+mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->then_body, local_dnos);
+	foreach(l, stmt->elsif_list)
+	{
+		PLpgSQL_if_elsif *elif = (PLpgSQL_if_elsif *) lfirst(l);
+
+		/* elif->cond cannot be an assignment source */
+		mark_stmts(elif->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_body, local_dnos);
+}
+
+static void
+mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->t_expr cannot be an assignment source */
+	foreach(l, stmt->case_when_list)
+	{
+		PLpgSQL_case_when *cwt = (PLpgSQL_case_when *) lfirst(l);
+
+		/* cwt->expr cannot be an assignment source */
+		mark_stmts(cwt->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_stmts, local_dnos);
+}
+
+static void
+mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->lower, upper, step cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+}
+
+static void
+mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->argquery cannot be an assignment source */
+}
+
+static void
+mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->argquery, query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+}
+
+static void
+mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->params cannot contain an assignment source */
+	/* stmt->options cannot contain an assignment source */
+}
+
+static void
+mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond, message cannot be an assignment source */
+}
+
+static void
+mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->sqlstmt cannot be an assignment source */
+}
+
+static void
+mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos)
+{
+}
+
+void
+plpgsql_mark_local_assignment_targets(PLpgSQL_function *func)
+{
+	Bitmapset  *local_dnos;
+
+	/* Function parameters can be treated as local targets at outer level */
+	local_dnos = NULL;
+	for (int i = 0; i < func->fn_nargs; i++)
+		local_dnos = bms_add_member(local_dnos, func->fn_argvarnos[i]);
+	if (func->action)
+		mark_block(func->action, local_dnos);
+	bms_free(local_dnos);
+}
+
+
 /**********************************************************************
  * Release memory when a PL/pgSQL function is no longer needed
  *
@@ -1594,6 +1989,9 @@ static void
 dump_expr(PLpgSQL_expr *expr)
 {
 	printf("'%s'", expr->query);
+	if (expr->target_param >= 0)
+		printf(" target %d%s", expr->target_param,
+			   expr->target_is_local ? " (local)" : "");
 }
 
 void
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 5431977d69..ddbfda8388 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -2314,6 +2314,8 @@ exception_sect	:
 						PLpgSQL_exception_block *new = palloc(sizeof(PLpgSQL_exception_block));
 						PLpgSQL_variable *var;
 
+						plpgsql_curr_compile->has_exception_block = true;
+
 						var = plpgsql_build_variable("sqlstate", lineno,
 													 plpgsql_build_datatype(TEXTOID,
 																			-1,
@@ -2659,6 +2661,7 @@ make_plpgsql_expr(const char *query,
 	expr->ns = plpgsql_ns_top();
 	/* might get changed later during parsing: */
 	expr->target_param = -1;
+	expr->target_is_local = false;
 	/* other fields are left as zeroes until first execution */
 	return expr;
 }
@@ -2673,9 +2676,21 @@ expr_is_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
 	 * other DTYPEs.
 	 */
 	if (target->dtype == PLPGSQL_DTYPE_VAR)
+	{
 		expr->target_param = target->dno;
+
+		/*
+		 * For now, assume the target is local to the nearest enclosing
+		 * exception block.  That's correct if the function contains no
+		 * exception blocks; otherwise we'll update this later.
+		 */
+		expr->target_is_local = true;
+	}
 	else
+	{
 		expr->target_param = -1;	/* should be that already */
+		expr->target_is_local = false; /* ditto */
+	}
 }
 
 /* Convenience routine to read an expression with one possible terminator */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index fbb6000caa..c6fadc5660 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -225,9 +225,12 @@ typedef struct PLpgSQL_expr
 	/*
 	 * These fields are used to help optimize assignments to expanded-datum
 	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 * simple variable, target_param holds that variable's dno (else it's -1),
+	 * and target_is_local indicates whether the target is declared inside the
+	 * closest exception block containing the assignment.
 	 */
 	int			target_param;	/* dno of assign target, or -1 if none */
+	bool		target_is_local;	/* is it within nearest exception block? */
 
 	/*
 	 * Fields above are set during plpgsql parsing.  Remaining fields are left
@@ -1014,6 +1017,7 @@ typedef struct PLpgSQL_function
 	/* data derived while parsing body */
 	unsigned int nstatements;	/* counter for assigning stmtids */
 	bool		requires_procedure_resowner;	/* contains CALL or DO? */
+	bool		has_exception_block;	/* contains BEGIN...EXCEPTION? */
 
 	/* these fields change when the function is used */
 	struct PLpgSQL_execstate *cur_estate;
@@ -1314,6 +1318,7 @@ extern PLpgSQL_nsitem *plpgsql_ns_find_nearest_loop(PLpgSQL_nsitem *ns_cur);
  */
 extern PGDLLEXPORT const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
 extern const char *plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind);
+extern void plpgsql_mark_local_assignment_targets(PLpgSQL_function *func);
 extern void plpgsql_free_function_memory(PLpgSQL_function *func);
 extern void plpgsql_dumptree(PLpgSQL_function *func);
 
-- 
2.43.5



  [text/x-diff] v2-0003-Implement-new-optimization-rule-for-updates-of-ex.patch (26.3K, 4-v2-0003-Implement-new-optimization-rule-for-updates-of-ex.patch)
  download | inline diff:
From 4666ac2ba3433574cd3024f49f58b290a58b1f86 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Tue, 3 Dec 2024 14:43:33 -0500
Subject: [PATCH v2 3/4] Implement new optimization rule for updates of
 expanded variables.

If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/include/executor/execExpr.h               |   1 +
 src/pl/plpgsql/src/expected/plpgsql_array.out |   9 +
 src/pl/plpgsql/src/pl_exec.c                  | 376 +++++++++++++++---
 src/pl/plpgsql/src/plpgsql.h                  |  22 +-
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |   9 +
 src/tools/pgindent/typedefs.list              |   2 +
 6 files changed, 357 insertions(+), 62 deletions(-)

diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 56fb0d0adb..7d58b3dc9c 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -406,6 +406,7 @@ typedef struct ExprEvalStep
 		{
 			ExecEvalSubroutine paramfunc;	/* add-on evaluation subroutine */
 			void	   *paramarg;	/* private data for same */
+			void	   *paramarg2;	/* more private data for same */
 			int			paramid;	/* numeric ID for parameter */
 			Oid			paramtype;	/* OID of parameter's datatype */
 		}			cparam;
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index ad60e0e8be..e5db6d6087 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -52,6 +52,15 @@ NOTICE:  a = ("{""(,11)""}",), a.c1[1].i = 11
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 NOTICE:  a = {1,2,3}
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 1a9c010205..ae878782b8 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -251,6 +251,15 @@ static HTAB *shared_cast_hash = NULL;
 	else \
 		Assert(rc == PLPGSQL_RC_OK)
 
+/* State struct for count_param_references */
+typedef struct count_param_references_context
+{
+	int			paramid;
+	int			count;
+	Param	   *last_param;
+} count_param_references_context;
+
+
 /************************************************************
  * Local function forward declarations
  ************************************************************/
@@ -336,7 +345,9 @@ static void exec_prepare_plan(PLpgSQL_execstate *estate,
 static void exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr);
 static bool exec_is_simple_query(PLpgSQL_expr *expr);
 static void exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan);
-static void exec_check_rw_parameter(PLpgSQL_expr *expr);
+static void exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid);
+static bool count_param_references(Node *node,
+								   count_param_references_context *context);
 static void exec_check_assignable(PLpgSQL_execstate *estate, int dno);
 static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
 								  PLpgSQL_expr *expr,
@@ -384,6 +395,10 @@ static ParamExternData *plpgsql_param_fetch(ParamListInfo params,
 static void plpgsql_param_compile(ParamListInfo params, Param *param,
 								  ExprState *state,
 								  Datum *resv, bool *resnull);
+static void plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+										 ExprContext *econtext);
+static void plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+											ExprContext *econtext);
 static void plpgsql_param_eval_var(ExprState *state, ExprEvalStep *op,
 								   ExprContext *econtext);
 static void plpgsql_param_eval_var_ro(ExprState *state, ExprEvalStep *op,
@@ -6078,10 +6093,13 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 
 		/*
 		 * Reset to "not simple" to leave sane state (with no dangling
-		 * pointers) in case we fail while replanning.  expr_simple_plansource
-		 * can be left alone however, as that cannot move.
+		 * pointers) in case we fail while replanning.  We'll need to
+		 * re-determine simplicity and R/W optimizability anyway, since those
+		 * could change with the new plan.  expr_simple_plansource can be left
+		 * alone however, as that cannot move.
 		 */
 		expr->expr_simple_expr = NULL;
+		expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 		expr->expr_rw_param = NULL;
 		expr->expr_simple_plan = NULL;
 		expr->expr_simple_plan_lxid = InvalidLocalTransactionId;
@@ -6439,16 +6457,27 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	scratch.resnull = resnull;
 
 	/*
-	 * Select appropriate eval function.  It seems worth special-casing
-	 * DTYPE_VAR and DTYPE_RECFIELD for performance.  Also, we can determine
-	 * in advance whether MakeExpandedObjectReadOnly() will be required.
-	 * Currently, only VAR/PROMISE and REC datums could contain read/write
-	 * expanded objects.
+	 * Select appropriate eval function.
+	 *
+	 * First, if this Param references the same varlena-type DTYPE_VAR datum
+	 * that is the target of the assignment containing this simple expression,
+	 * then it's possible we will be able to optimize handling of R/W expanded
+	 * datums.  We don't want to do the work needed to determine that unless
+	 * we actually see a R/W expanded datum at runtime, so install a checking
+	 * function that will figure that out when needed.
+	 *
+	 * Otherwise, it seems worth special-casing DTYPE_VAR and DTYPE_RECFIELD
+	 * for performance.  Also, we can determine in advance whether
+	 * MakeExpandedObjectReadOnly() will be required.  Currently, only
+	 * VAR/PROMISE and REC datums could contain read/write expanded objects.
 	 */
 	if (datum->dtype == PLPGSQL_DTYPE_VAR)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		bool		isvarlena = (((PLpgSQL_var *) datum)->datatype->typlen == -1);
+
+		if (isvarlena && dno == expr->target_param && expr->expr_simple_expr)
+			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_check;
+		else if (isvarlena)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var;
@@ -6457,14 +6486,12 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_recfield;
 	else if (datum->dtype == PLPGSQL_DTYPE_PROMISE)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		if (((PLpgSQL_var *) datum)->datatype->typlen == -1)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
 	}
-	else if (datum->dtype == PLPGSQL_DTYPE_REC &&
-			 param != expr->expr_rw_param)
+	else if (datum->dtype == PLPGSQL_DTYPE_REC)
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 	else
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
@@ -6473,14 +6500,170 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	 * Note: it's tempting to use paramarg to store the estate pointer and
 	 * thereby save an indirection or two in the eval functions.  But that
 	 * doesn't work because the compiled expression might be used with
-	 * different estates for the same PL/pgSQL function.
+	 * different estates for the same PL/pgSQL function.  Instead, store
+	 * pointers to the PLpgSQL_expr as well as this specific Param, to support
+	 * plpgsql_param_eval_var_check().
 	 */
-	scratch.d.cparam.paramarg = NULL;
+	scratch.d.cparam.paramarg = expr;
+	scratch.d.cparam.paramarg2 = param;
 	scratch.d.cparam.paramid = param->paramid;
 	scratch.d.cparam.paramtype = param->paramtype;
 	ExprEvalPushStep(state, &scratch);
 }
 
+/*
+ * plpgsql_param_eval_var_check		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we may need to determine the applicability of a read/write optimization,
+ * but we've not done that yet.
+ */
+static void
+plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+							 ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, it's time to
+	 * decide whether/how to optimize the assignment.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		PLpgSQL_expr *expr = (PLpgSQL_expr *) op->d.cparam.paramarg;
+		Param	   *param = (Param *) op->d.cparam.paramarg2;
+
+		/*
+		 * We might have already figured this out while evaluating some other
+		 * Param referencing the same variable.
+		 */
+		if (expr->expr_rwopt == PLPGSQL_RWOPT_UNKNOWN)
+			exec_check_rw_parameter(expr, op->d.cparam.paramid);
+
+		/*
+		 * Update the callback pointer to match what we decided to do, and
+		 * pass off this execution to the selected function.
+		 */
+		switch (expr->expr_rwopt)
+		{
+			case PLPGSQL_RWOPT_UNKNOWN:
+				Assert(false);
+				break;
+			case PLPGSQL_RWOPT_NOPE:
+				/* Force the value to read-only in all future executions */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+				plpgsql_param_eval_var_ro(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_TRANSFER:
+				/* There can be only one matching Param in this case */
+				Assert(param == expr->expr_rw_param);
+				/* When the value is read/write, transfer to exec context */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_transfer;
+				plpgsql_param_eval_var_transfer(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_INPLACE:
+				if (param == expr->expr_rw_param)
+				{
+					/* When the value is read/write, deliver it as-is */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var;
+					plpgsql_param_eval_var(state, op, econtext);
+				}
+				else
+				{
+					/* Not the optimizable reference, so force to read-only */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+					plpgsql_param_eval_var_ro(state, op, econtext);
+				}
+				break;
+		}
+		return;
+	}
+
+	/*
+	 * Otherwise, continue to postpone that decision, and execute an inlined
+	 * version of exec_eval_datum().  Although this value could potentially
+	 * need MakeExpandedObjectReadOnly, we know it doesn't right now.
+	 */
+	*op->resvalue = var->value;
+	*op->resnull = var->isnull;
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
+/*
+ * plpgsql_param_eval_var_transfer		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we have determined that a read/write expanded value can be handed off
+ * into execution of the expression (and then possibly returned to our
+ * function's ownership afterwards).  We have to test though, because the
+ * variable might not contain a read/write expanded value during this
+ * execution.
+ */
+static void
+plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+								ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, transfer its
+	 * ownership into the expression execution context, then drop our own
+	 * reference to the value by setting the variable to NULL.  That'll be
+	 * overwritten (perhaps with this same object) when control comes back
+	 * from the expression.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		*op->resvalue = TransferExpandedObject(var->value,
+											   get_eval_mcontext(estate));
+		*op->resnull = false;
+
+		var->value = (Datum) 0;
+		var->isnull = true;
+		var->freeval = false;
+	}
+	else
+	{
+		/*
+		 * Otherwise we can pass the variable's value directly; we now know
+		 * that MakeExpandedObjectReadOnly isn't needed.
+		 */
+		*op->resvalue = var->value;
+		*op->resnull = var->isnull;
+	}
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
 /*
  * plpgsql_param_eval_var		evaluation of EEOP_PARAM_CALLBACK step
  *
@@ -7957,9 +8140,10 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 	MemoryContext oldcontext;
 
 	/*
-	 * Initialize to "not simple".
+	 * Initialize to "not simple", and reset R/W optimizability.
 	 */
 	expr->expr_simple_expr = NULL;
+	expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 	expr->expr_rw_param = NULL;
 
 	/*
@@ -8164,88 +8348,133 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
 	expr->expr_simple_typmod = exprTypmod((Node *) tle_expr);
 	/* We also want to remember if it is immutable or not */
 	expr->expr_simple_mutable = contain_mutable_functions((Node *) tle_expr);
-
-	/*
-	 * Lastly, check to see if there's a possibility of optimizing a
-	 * read/write parameter.
-	 */
-	exec_check_rw_parameter(expr);
 }
 
 /*
  * exec_check_rw_parameter --- can we pass expanded object as read/write param?
  *
- * If we have an assignment like "x := array_append(x, foo)" in which the
+ * There are two separate cases in which we can optimize an update to a
+ * variable that has a read/write expanded value by letting the called
+ * expression operate directly on the expanded value.  In both cases we
+ * are considering assignments like "var := array_append(var, foo)" where
+ * the assignment target is also an input to the RHS expression.
+ *
+ * Case 1 (RWOPT_TRANSFER rule): if the variable is "local" in the sense that
+ * its declaration is not outside any BEGIN...EXCEPTION block surrounding the
+ * assignment, then we do not need to worry about preserving its value if the
+ * RHS expression throws an error.  If in addition the variable is referenced
+ * exactly once in the RHS expression, then we can optimize by converting the
+ * read/write expanded value into a transient value within the expression
+ * evaluation context, and then setting the variable's recorded value to NULL
+ * to prevent double-free attempts.  This works regardless of any other
+ * details of the RHS expression.  If the expression eventually returns that
+ * same expanded object (possibly modified) then the variable will re-acquire
+ * ownership; while if it returns something else or throws an error, the
+ * expanded object will be discarded as part of cleanup of the evaluation
+ * context.
+ *
+ * Case 2 (RWOPT_INPLACE rule): if we have a non-local assignment or if
+ * it looks like "var := array_append(var, var[1])" with multiple references
+ * to the target variable, then we can't use case 1.  Nonetheless, if the
  * top-level function is trusted not to corrupt its argument in case of an
- * error, then when x has an expanded object as value, it is safe to pass the
- * value as a read/write pointer and let the function modify the value
- * in-place.
+ * error, then when the var has an expanded object as value, it is safe to
+ * pass the value as a read/write pointer to the top-level function and let
+ * the function modify the value in-place.  (Any other references have to be
+ * passed as read-only pointers as usual.)  Only the top-level function has to
+ * be trusted, since if anything further down fails, the object hasn't been
+ * modified yet.
  *
- * This function checks for a safe expression, and sets expr->expr_rw_param
- * to the address of any Param within the expression that can be passed as
- * read/write (there can be only one); or to NULL when there is no safe Param.
+ * This function checks to see if the assignment is optimizable according
+ * to either rule, and updates expr->expr_rwopt accordingly.  In addition,
+ * it sets expr->expr_rw_param to the address of the Param within the
+ * expression that can be passed as read/write (there can be only one);
+ * or to NULL when there is no safe Param.
  *
- * Note that this mechanism intentionally applies the safety labeling to just
- * one Param; the expression could contain other Params referencing the target
- * variable, but those must still be treated as read-only.
+ * Note that this mechanism intentionally allows just one Param to emit a
+ * read/write pointer; in case 2, the expression could contain other Params
+ * referencing the target variable, but those must be treated as read-only.
  *
  * Also note that we only apply this optimization within simple expressions.
  * There's no point in it for non-simple expressions, because the
  * exec_run_select code path will flatten any expanded result anyway.
- * Also, it's safe to assume that an expr_simple_expr tree won't get copied
- * somewhere before it gets compiled, so that looking for pointer equality
- * to expr_rw_param will work for matching the target Param.  That'd be much
- * shakier in the general case.
  */
 static void
-exec_check_rw_parameter(PLpgSQL_expr *expr)
+exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 {
-	int			target_dno;
+	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
 	ListCell   *lc;
 
 	/* Assume unsafe */
+	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
 	expr->expr_rw_param = NULL;
 
-	/* Done if expression isn't an assignment source */
-	target_dno = expr->target_param;
-	if (target_dno < 0)
-		return;
+	/* Shouldn't be here for non-simple expression */
+	Assert(sexpr != NULL);
+
+	/* Param should match the expression's assignment target, too */
+	Assert(paramid == expr->target_param + 1);
 
 	/*
-	 * If target variable isn't referenced by expression, no need to look
-	 * further.
+	 * If the assignment is to a "local" variable (one whose value won't
+	 * matter anymore if expression evaluation fails), and this Param is the
+	 * only reference to that variable in the expression, then we can
+	 * unconditionally optimize using the "transfer" method.
 	 */
-	if (!bms_is_member(target_dno, expr->paramnos))
-		return;
+	if (expr->target_is_local)
+	{
+		count_param_references_context context;
 
-	/* Shouldn't be here for non-simple expression */
-	Assert(expr->expr_simple_expr != NULL);
+		/* See how many references there are, and find one of them */
+		context.paramid = paramid;
+		context.count = 0;
+		context.last_param = NULL;
+		(void) count_param_references((Node *) sexpr, &context);
+
+		/* If we're here, the expr must contain some reference to the var */
+		Assert(context.count > 0);
+
+		/* If exactly one reference, success! */
+		if (context.count == 1)
+		{
+			expr->expr_rwopt = PLPGSQL_RWOPT_TRANSFER;
+			expr->expr_rw_param = context.last_param;
+			return;
+		}
+	}
 
 	/*
+	 * Otherwise, see if we can trust the expression's top-level function to
+	 * apply the "inplace" method.
+	 *
 	 * Top level of expression must be a simple FuncExpr, OpExpr, or
-	 * SubscriptingRef, else we can't optimize.
+	 * SubscriptingRef, else we can't identify which function is relevant. But
+	 * it's okay to look through any RelabelType above that, since that can't
+	 * fail.
 	 */
-	if (IsA(expr->expr_simple_expr, FuncExpr))
+	if (IsA(sexpr, RelabelType))
+		sexpr = ((RelabelType *) sexpr)->arg;
+	if (IsA(sexpr, FuncExpr))
 	{
-		FuncExpr   *fexpr = (FuncExpr *) expr->expr_simple_expr;
+		FuncExpr   *fexpr = (FuncExpr *) sexpr;
 
 		funcid = fexpr->funcid;
 		fargs = fexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, OpExpr))
+	else if (IsA(sexpr, OpExpr))
 	{
-		OpExpr	   *opexpr = (OpExpr *) expr->expr_simple_expr;
+		OpExpr	   *opexpr = (OpExpr *) sexpr;
 
 		funcid = opexpr->opfuncid;
 		fargs = opexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, SubscriptingRef))
+	else if (IsA(sexpr, SubscriptingRef))
 	{
-		SubscriptingRef *sbsref = (SubscriptingRef *) expr->expr_simple_expr;
+		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
 		/* We only trust standard varlena arrays to be safe */
+		/* TODO: install some extensibility here */
 		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
 			F_ARRAY_SUBSCRIPT_HANDLER)
 			return;
@@ -8256,9 +8485,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) sbsref->refexpr;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8293,9 +8523,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) arg;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8303,6 +8534,35 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 	}
 }
 
+/*
+ * Count Params referencing the specified paramid, and return one of them
+ * if there are any.
+ *
+ * We actually only need to distinguish 0, 1, and N references; so we can
+ * abort the tree traversal as soon as we've found two.
+ */
+static bool
+count_param_references(Node *node, count_param_references_context *context)
+{
+	if (node == NULL)
+		return false;
+	else if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		if (param->paramkind == PARAM_EXTERN &&
+			param->paramid == context->paramid)
+		{
+			context->last_param = param;
+			if (++(context->count) > 1)
+				return true;	/* abort tree traversal */
+		}
+		return false;
+	}
+	else
+		return expression_tree_walker(node, count_param_references, context);
+}
+
 /*
  * exec_check_assignable --- is it OK to assign to the indicated datum?
  *
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index c6fadc5660..3bafeea28b 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -187,6 +187,17 @@ typedef enum PLpgSQL_resolve_option
 	PLPGSQL_RESOLVE_COLUMN,		/* prefer table column to plpgsql var */
 } PLpgSQL_resolve_option;
 
+/*
+ * Status of optimization of assignment to a read/write expanded object
+ */
+typedef enum PLpgSQL_rwopt
+{
+	PLPGSQL_RWOPT_UNKNOWN = 0,	/* applicability not determined yet */
+	PLPGSQL_RWOPT_NOPE,			/* cannot do any optimization */
+	PLPGSQL_RWOPT_TRANSFER,		/* transfer the old value into expr state */
+	PLPGSQL_RWOPT_INPLACE,		/* pass value as R/W to top-level function */
+} PLpgSQL_rwopt;
+
 
 /**********************************************************************
  * Node and structure definitions
@@ -246,11 +257,14 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * If we match a Param within expr_simple_expr to the variable identified
-	 * by target_param, that Param's address is stored in expr_rw_param; then
-	 * expression code generation will allow the value for that Param to be
-	 * passed as a read/write expanded-object pointer.
+	 * expr_rwopt tracks whether we have determined that assignment to a
+	 * read/write expanded object (stored in the target_param datum) can be
+	 * optimized by passing it to the expr as a read/write expanded-object
+	 * pointer.  If so, expr_rw_param identifies the specific Param that
+	 * should emit a read/write pointer; any others will emit read-only
+	 * pointers.
 	 */
+	PLpgSQL_rwopt expr_rwopt;	/* can we apply R/W optimization? */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4b9ff51594..4a346203dc 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -48,6 +48,15 @@ begin a.c1[1].i := 11; raise notice 'a = %, a.c1[1].i = %', a, a.c1[1].i; end$$;
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+
 create temp table onecol as select array[1,2] as f1;
 
 do $$ declare a int[];
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2d4c870423..80328115a1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1867,6 +1867,7 @@ PLpgSQL_rec
 PLpgSQL_recfield
 PLpgSQL_resolve_option
 PLpgSQL_row
+PLpgSQL_rwopt
 PLpgSQL_stmt
 PLpgSQL_stmt_assert
 PLpgSQL_stmt_assign
@@ -3392,6 +3393,7 @@ core_yy_extra_type
 core_yyscan_t
 corrupt_items
 cost_qual_eval_context
+count_param_references_context
 cp_hash_func
 create_upper_paths_hook_type
 createdb_failure_params
-- 
2.43.5



  [text/x-diff] v2-0004-Allow-extension-functions-to-participate-in-in-pl.patch (17.1K, 5-v2-0004-Allow-extension-functions-to-participate-in-in-pl.patch)
  download | inline diff:
From 62828464b0b74a46da56d110ce2c831641f0b2bc Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Tue, 3 Dec 2024 19:25:22 -0500
Subject: [PATCH v2 4/4] Allow extension functions to participate in in-place
 updates.

Commit 1dc5ebc90 allowed PL/pgSQL to perform in-place updates
of expanded-object variables that are being updated with
assignments like "x := f(x, ...)".  However this was allowed
only for a hard-wired list of functions f(), since we need to
be sure that f() will not modify the variable if it fails.
It was always envisioned that we should make that extensible,
but at the time we didn't have a good way to do so.  Since
then we've invented the idea of "support functions" to allow
attaching specialized optimization knowledge to functions,
and that is a perfect mechanism for doing this.

Hence, adjust PL/pgSQL to use a support function request
instead of hard-wired logic to decide if in-place update
is safe.  Replace the previous behavior by creating support
functions for the three functions that were previously
hard-wired.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/backend/utils/adt/array_userfuncs.c       | 61 +++++++++++++
 src/backend/utils/adt/arraysubs.c             | 34 ++++++++
 src/include/catalog/pg_proc.dat               | 20 +++--
 src/include/nodes/supportnodes.h              | 55 +++++++++++-
 src/pl/plpgsql/src/expected/plpgsql_array.out |  3 +-
 src/pl/plpgsql/src/pl_exec.c                  | 86 ++++++++-----------
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |  1 +
 src/tools/pgindent/typedefs.list              |  1 +
 8 files changed, 202 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 304a93112e..bddacd4802 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -16,6 +16,7 @@
 #include "common/int.h"
 #include "common/pg_prng.h"
 #include "libpq/pqformat.h"
+#include "nodes/supportnodes.h"
 #include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
@@ -167,6 +168,36 @@ array_append(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_append_support()
+ *
+ * Planner support function for array_append()
+ */
+Datum
+array_append_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place appends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) linitial(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_prepend :
  *		push an element onto the front of a one-dimensional array
@@ -230,6 +261,36 @@ array_prepend(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_prepend_support()
+ *
+ * Planner support function for array_prepend()
+ */
+Datum
+array_prepend_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place prepends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) lsecond(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_cat :
  *		concatenate two nD arrays to form an nD array, or
diff --git a/src/backend/utils/adt/arraysubs.c b/src/backend/utils/adt/arraysubs.c
index 6f68dfa5b2..3c4f1664d3 100644
--- a/src/backend/utils/adt/arraysubs.c
+++ b/src/backend/utils/adt/arraysubs.c
@@ -18,6 +18,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/subscripting.h"
+#include "nodes/supportnodes.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_expr.h"
 #include "utils/array.h"
@@ -575,3 +576,36 @@ raw_array_subscript_handler(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(&sbsroutines);
 }
+
+/*
+ * array_subscript_handler_support()
+ *
+ * Planner support function for array_subscript_handler()
+ */
+Datum
+array_subscript_handler_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place subscripted assignment if the refexpr is
+		 * the array being assigned to.  We don't need to worry about array
+		 * references within the refassgnexpr or the subscripts; however, if
+		 * there's no refassgnexpr then it's a fetch which there's no need to
+		 * optimize.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *refexpr = (Param *) linitial(req->args);
+
+		if (refexpr && IsA(refexpr, Param) &&
+			refexpr->paramkind == PARAM_EXTERN &&
+			refexpr->paramid == req->paramid &&
+			lsecond(req->args) != NULL)
+			ret = (Node *) refexpr;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9575524007..79a9dc383a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1598,14 +1598,20 @@
   proname => 'cardinality', prorettype => 'int4', proargtypes => 'anyarray',
   prosrc => 'array_cardinality' },
 { oid => '378', descr => 'append element onto end of array',
-  proname => 'array_append', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_append', prosupport => 'array_append_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_append' },
+{ oid => '8680', descr => 'planner support for array_append',
+  proname => 'array_append_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_append_support' },
 { oid => '379', descr => 'prepend element onto front of array',
-  proname => 'array_prepend', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_prepend', prosupport => 'array_prepend_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatible anycompatiblearray',
   prosrc => 'array_prepend' },
+{ oid => '8681', descr => 'planner support for array_prepend',
+  proname => 'array_prepend_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_prepend_support' },
 { oid => '383',
   proname => 'array_cat', proisstrict => 'f',
   prorettype => 'anycompatiblearray',
@@ -12160,8 +12166,12 @@
 
 # subscripting support for built-in types
 { oid => '6179', descr => 'standard array subscripting support',
-  proname => 'array_subscript_handler', prorettype => 'internal',
+  proname => 'array_subscript_handler',
+  prosupport => 'array_subscript_handler_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'array_subscript_handler' },
+{ oid => '8682', descr => 'planner support for array_subscript_handler',
+  proname => 'array_subscript_handler_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_subscript_handler_support' },
 { oid => '6180', descr => 'raw array subscripting support',
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index 5f7bcde891..1ca6e477ac 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -6,10 +6,10 @@
  * This file defines the API for "planner support functions", which
  * are SQL functions (normally written in C) that can be attached to
  * another "target" function to give the system additional knowledge
- * about the target function.  All the current capabilities have to do
- * with planning queries that use the target function, though it is
- * possible that future extensions will add functionality to be invoked
- * by the parser or executor.
+ * about the target function.  The name is now something of a misnomer,
+ * since some of the call sites are in the executor not the planner,
+ * but "function support function" would be a confusing name so we
+ * stick with "planner support function".
  *
  * A support function must have the SQL signature
  *		supportfn(internal) returns internal
@@ -343,4 +343,51 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+/*
+ * The ModifyInPlace request allows the support function to detect whether
+ * a call to its target function can be allowed to modify a read/write
+ * expanded object in-place.  The context is that we are considering a
+ * PL/pgSQL (or similar PL) assignment of the form "x := f(x, ...)" where
+ * the variable x is of a type that can be represented as an expanded object
+ * (see utils/expandeddatum.h).  If f() can usefully optimize by modifying
+ * the passed-in object in-place, then this request can be implemented to
+ * instruct PL/pgSQL to pass a read-write expanded pointer to the variable's
+ * value.  (Note that there is no guarantee that later calls to f() will
+ * actually do so.  If f() receives a read-only pointer, or a pointer to a
+ * non-expanded object, it must follow the usual convention of not modifying
+ * the pointed-to object.)  There are two requirements that must be met
+ * to make this safe:
+ * 1. f() must guarantee that it will not have modified the object if it
+ * fails.  Otherwise the variable's value might change unexpectedly.
+ * 2. If the other arguments to f() ("..." in the above example) contain
+ * references to x, f() must be able to cope with that; or if that's not
+ * safe, the support function must scan the other arguments to verify that
+ * there are no other references to x.  An example of the concern here is
+ * that in "arr := array_append(arr, arr[1])", if the array element type
+ * is pass-by-reference then array_append would receive a second argument
+ * that points into the array object it intends to modify.  array_append is
+ * coded to make that safe, but other functions might not be able to cope.
+ *
+ * "args" is a node tree list representing the function's arguments.
+ * One or more nodes within the node tree will be PARAM_EXTERN Params
+ * with ID "paramid", which represent the assignment target variable.
+ * (Note that such references are not necessarily at top level in the list,
+ * for example we might have "x := f(x, g(x))".  Generally it's only safe
+ * to optimize a reference that is at top level, else we're making promises
+ * about the behavior of g() as well as f().)
+ *
+ * If modify-in-place is safe, the support function should return the
+ * address of the Param node that is to return a read-write pointer.
+ * (At most one of the references is allowed to do so.)  Otherwise,
+ * return NULL.
+ */
+typedef struct SupportRequestModifyInPlace
+{
+	NodeTag		type;
+
+	Oid			funcid;			/* PG_PROC OID of the target function */
+	List	   *args;			/* Arguments to the function */
+	int			paramid;		/* ID of Param(s) representing variable */
+} SupportRequestModifyInPlace;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index e5db6d6087..4c6b3ce998 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -57,10 +57,11 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
-NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
+NOTICE:  a = {1,1,2,3,42,3,1,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index ae878782b8..e3643fc7a8 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -29,6 +29,7 @@
 #include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
@@ -8404,7 +8405,7 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
-	ListCell   *lc;
+	Oid			prosupport;
 
 	/* Assume unsafe */
 	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
@@ -8473,64 +8474,51 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	{
 		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
-		/* We only trust standard varlena arrays to be safe */
-		/* TODO: install some extensibility here */
-		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
-			F_ARRAY_SUBSCRIPT_HANDLER)
-			return;
-
-		/* We can optimize the refexpr if it's the target, otherwise not */
-		if (sbsref->refexpr && IsA(sbsref->refexpr, Param))
-		{
-			Param	   *param = (Param *) sbsref->refexpr;
+		funcid = get_typsubscript(sbsref->refcontainertype, NULL);
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
-
-		return;
+		/*
+		 * We assume that only the refexpr and refassgnexpr (if any) are
+		 * relevant to the support function's decision.  If that turns out to
+		 * be a bad idea, we could incorporate the subscript expressions into
+		 * the fargs list somehow.
+		 */
+		fargs = list_make2(sbsref->refexpr, sbsref->refassgnexpr);
 	}
 	else
 		return;
 
 	/*
-	 * The top-level function must be one that we trust to be "safe".
-	 * Currently we hard-wire the list, but it would be very desirable to
-	 * allow extensions to mark their functions as safe ...
+	 * The top-level function must be one that can handle in-place update
+	 * safely.  We allow functions to declare their ability to do that via a
+	 * support function request.
 	 */
-	if (!(funcid == F_ARRAY_APPEND ||
-		  funcid == F_ARRAY_PREPEND))
-		return;
-
-	/*
-	 * The target variable (in the form of a Param) must appear as a direct
-	 * argument of the top-level function.  References further down in the
-	 * tree can't be optimized; but on the other hand, they don't invalidate
-	 * optimizing the top-level call, since that will be executed last.
-	 */
-	foreach(lc, fargs)
+	prosupport = get_func_support(funcid);
+	if (OidIsValid(prosupport))
 	{
-		Node	   *arg = (Node *) lfirst(lc);
+		SupportRequestModifyInPlace req;
+		Param	   *param;
 
-		if (arg && IsA(arg, Param))
-		{
-			Param	   *param = (Param *) arg;
+		req.type = T_SupportRequestModifyInPlace;
+		req.funcid = funcid;
+		req.args = fargs;
+		req.paramid = paramid;
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
+		param = (Param *)
+			DatumGetPointer(OidFunctionCall1(prosupport,
+											 PointerGetDatum(&req)));
+
+		if (param == NULL)
+			return;				/* support function fails */
+
+		/* Verify support function followed the API */
+		Assert(IsA(param, Param));
+		Assert(param->paramkind == PARAM_EXTERN);
+		Assert(param->paramid == paramid);
+
+		/* Found the Param we want to pass as read/write */
+		expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
+		expr->expr_rw_param = param;
+		return;
 	}
 }
 
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4a346203dc..da984a9941 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -53,6 +53,7 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 80328115a1..b6d3f9f659 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2787,6 +2787,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestModifyInPlace
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.43.5



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-05 20:34  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  1 sibling, 0 replies; 34+ messages in thread

From: Tom Lane @ 2024-12-05 20:34 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

Michel Pelletier <[email protected]> writes:
> Here's a WIP patch for a pgexpanded example in src/test/modules.  The
> object is very simple and starts with an integer and increments that value
> every time it is expanded.  I added some regression tests that test two sql
> functions that replicate the expansion issue that I'm seeing with my
> extension.

I took a look through this and have a few comments:

* I really don't like the "increments when expanded" behavior.  That's
just insane from a semantic viewpoint: expanding and then flattening
a value should not change it, at least not in any user-visible way.
I get that you want the tests to exhibit how many times expansion
happened, but the LOGF() debug printouts seem to serve that need just
fine.  How about we just make the objects store a palloc'd string,
or some such?

* context_callback_exobj_free() seems bizarre.  I can see the
usefulness of the LOGF() printout for the test's purposes, but there's
no need for the pfree() because the whole context is about to go away.
I'm concerned that people using this as a skeleton might think they
need to do something equivalent, when they don't.

* Maybe the answer to the above objection is to make the expanded
object hold a malloc'd not palloc'd string, so that it's actually
necessary to have an explicit free() to avoid a permanent memory
leak.  This'd be silly in a real use-case, and should be
documented as such; but it seems good to have a callback function
to demonstrate how to clean up resources outside the expanded
object itself.

* BTW, it might be good to make LOGF() print more than just the
function's name; some indication of what it's been called on
could be very valuable.

* If we believe that this is a skeleton other people might
use as a starting point, it'd be good to have more comments
explaining what each bit is doing and why.  For instance,
I think it's important to note that a context callback function
isn't required if deleting the memory context is sufficient
to clean up the object.

* It seems like test_expand() and test_expand_expand() belong
to the pgexpanded.sql test script and should be defined there,
rather than being part of the extension.

* As the patch stands, the module would never be invoked by
"make check", other than by manually invoking that in the
new subdirectory.  You need to hook it into the parent
directory's Makefile.  And you need meson infrastructure
too, ie a meson.build file.  (Should be able to mostly crib
that from one of the sibling test modules.)

* This does not seem like a great idea to me:
   +SET client_min_messages = 'debug1';
It's pure luck if the test output doesn't get messed up by
somebody else's unrelated debug output, because there's
elog(DEBUG1) in a lot of places and more could show up any day.
I'd be inclined to make LOGF() emit messages at NOTICE level,
and then the test doesn't have to mess with client_min_messages
at all.

* There's a whole bunch of minor ways in which this doesn't
conform to project style:

- We don't use markdown (.md) for per-directory README files.
  (There's been discussion of that, but we're not there today.)
  I would not use a URL to cross-reference our docs, either.

- "create or replace function" isn't great style in test
  scripts, much less extension scripts.  We're not expecting
  conflicting functions to exist, and if one does it's better
  to error out than silently overwrite it.

- We put '#include "postgres.h"' in .c files, never in .h files.

- At least the .c and .h files should carry standard header
  comments with a PGDG copyright notice.

- Code layout and comments are frequently not per style.
  You can mostly fix this by running the .c and .h
  files through pgindent, but it might mangle your comments;
  usually best to make those look like project standard first.
  See https://www.postgresql.org/docs/devel/source-format.html

- Declaring static functions in a .h file doesn't seem like
  a great plan.  If the thing is only meant to be included in
  one .c file, why bother with a separate .h file at all?
  "static const ExpandedObjectMethods exobj_methods" belongs
  there even less, since you'd end with one copy per
  including file.

- We don't put "/* Local Variables: */" comments into code
  files; we generally expect the source files to be agnostic
  about what editors people use on them.


To move forward, I'd suggest dealing with these concerns first,
and then as a separate patch start adding things like
in-place-modification support.  It'd be cool to see the
results of that optimization manifest as fewer expansions
visible in the test's output.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-07 00:51  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2024-12-07 00:51 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Tue, Dec 3, 2024 at 4:42 PM Tom Lane <[email protected]> wrote:

> Michel Pelletier <[email protected]> writes:
> > Here's a WIP patch for a pgexpanded example in src/test/modules.
>
> I didn't look at your patch yet, but in the meantime here's an update
> that takes the next step towards what I promised.
>

Awesome!  I made a support function for my set_element(matrix, i, j, value)
function and it works great, but I do have a couple questions about how to
move forward with some of my other methods.  For reference, here's the
set_element test function, and a trace of function calls, after
implementing the support function for set_element, the expand_matrix
function is called twice, first by nvals(), and later by the first
set_element, but not the subsequent sets, so the true clause of the
PLPGSQL_RWOPT_INPLACE case in plpgsql_param_eval_var_check works great and
as expected:

postgres=# create or replace function test_se(graph matrix) returns matrix
language plpgsql as
    $$
    declare
        nvals bigint;
    begin
        graph = wait(graph);
        nvals = nvals(graph);
        raise notice 'nvals: %', nvals;
        graph = set_element(graph, 4, 2, 42);
        graph = set_element(graph, 4, 3, 43);
        graph = set_element(graph, 4, 4, 44);
        return graph;
    end;
    $$;
CREATE FUNCTION
postgres=# select nvals(test_se('int32'::matrix));
DEBUG:  new_matrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  matrix_wait
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix            <- wait expands with support function
DEBUG:  new_matrix
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix            <- nvals() reexpands
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free
NOTICE:  nvals: 0
DEBUG:  scalar_int32
DEBUG:  new_scalar
DEBUG:  matrix_set_element
DEBUG:  DatumGetMatrix            < set_element does not reexpand, yay!
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  scalar_int32
DEBUG:  new_scalar
DEBUG:  matrix_set_element
DEBUG:  DatumGetMatrix
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  scalar_int32
DEBUG:  new_scalar
DEBUG:  matrix_set_element
DEBUG:  DatumGetMatrix
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_matrix_free
┌───────┐
│ nvals │
├───────┤
│     3 │
└───────┘

 My question is about nvals = nvals(graph) in that function above, the
object is flattened and then rexpanded, even after the object was expanded
and returned by wait() [1] into a r/w pointer.  set_element honors the
expanded object from wait(), but nvals does not.  It seems like I want to
be able to pass the argument as a RO pointer, but I'm not sure how to
trigger the "else" clause in the PLPGSQL_RWOPT_INPLACE case.  I can see how
the support function triggers the if, but I don't see a similar way to
trigger the else.  I'm almost certainly missing something.

[1](Due to the asynchronous nature of the GraphBLAS API, there is a
GrB_wait() function to wait until an object is "complete" computationally,
but since this object was just expanded, there is no pending work, so the
wait() is essentially a noop but a handy way for me to return a r/w pointer
for subsequent operations).

set_element and wait are the simple case where there is only one reference
to the r/w object in the argument list.  As we've discussed I have many
functions that can possibly have multiple references to the same object.
The core operation of the library is matrix multiplication, and all other
complex functions follow a very similar pattern, so I'll just focus on that
one here:

CREATE FUNCTION mxm(
    a matrix,
    b matrix,
    op semiring default null,
    inout c matrix default null,
    mask matrix default null,
    accum binaryop default null,
    descr descriptor default null
    )
RETURNS matrix

The order of arguments mostly follows the order in the C API.  a and b are
the left and right matrix operands and the matrix product is the return
value.  If c is not null, then it is a pre-created return value which may
contain partial results already from some previous operations, otherwise
mxm creates a new matrix of the correct dimensions and returns that.  I
think the inout is meaningless as it doesn't seem to change anything, but
I'm using it as a visual indication in code that c can be the return value
if it's not null.

Here's an example of doing Triangle Counting using Burkhardt's method [2],
where a, b, and the mask are all the same adjacency matrix (here the
Newman/karate graph.  The 'plus_pair' semiring is optimized for structural
counting, and the descriptor 's' tells suitesparse to only use the
structure of the mask and not to consider the values):

[2] https://doi.org/10.1177/1473871616666393

CREATE OR REPLACE FUNCTION public.tcount_burkhardt(graph matrix)
 RETURNS bigint
 LANGUAGE plpgsql
AS $$
    begin
        graph = wait(graph);
        graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph,
descr=>'s');
        return reduce_scalar(graph) / 6;
    end;
    $$;

postgres=# select tcount_burkhardt(graph) from karateg;
DEBUG:  matrix_wait
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix                    <- wait expands and returns r/w
pointer with support function
DEBUG:  new_matrix
DEBUG:  matrix_mxm                        <- mxm starts here
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                    <- expanding left operand again
DEBUG:  new_matrix
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                    <- expanding right operand again
DEBUG:  new_matrix
DEBUG:  DatumGetSemiring
DEBUG:  expand_semiring
DEBUG:  new_semiring
DEBUG:  new_matrix
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                    <- expanding mask argument again
DEBUG:  new_matrix
DEBUG:  DatumGetDescriptor
DEBUG:  expand_descriptor
DEBUG:  new_descriptor
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_descriptor_free
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_semiring_free
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_matrix_free
DEBUG:  matrix_reduce_scalar
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                                        <- reduce also
re-expands matrix
DEBUG:  new_matrix
DEBUG:  new_scalar
DEBUG:  scalar_div_int32
DEBUG:  DatumGetScalar
DEBUG:  new_scalar
DEBUG:  cast_scalar_int64
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  context_callback_scalar_free
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_matrix_free
┌──────────────────┐
│ tcount_burkhardt                  │
├──────────────────┤
│               45                           │
└──────────────────┘

mxm calls expand_matrix three times for each of the three arguments.
Ideally I'd like the already expanded rw pointer from wait() to be honored
by mxm so that it doesn't re-expand the object three times but, like
set_element, not at all.

Hope that question makes sense, still going on the main theory that I'm not
understanding the support function, and maybe the c argument thing being
optional throws a wrench in the plan, and I'm happy to try and find a
workaround for that.  Maybe always requiring the result to be
pre-constructed and then making c required and reassigning back to the same
input argument is the right approach?

Some good news I always like seeing is that 45 is the right answer.  Very
close to having optimal sparse linear algebra in Postgres!

Thanks for your help!  I'll move onto your comments on the test module next.

-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-08 07:05  Michel Pelletier <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2024-12-08 07:05 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Fri, Dec 6, 2024 at 4:51 PM Michel Pelletier <[email protected]>
wrote:

>
>  My question is about nvals = nvals(graph) in that function above, the
> object is flattened and then rexpanded, even after the object was expanded
> and returned by wait() [1] into a r/w pointer.
>
...

>
> mxm calls expand_matrix three times for each of the three arguments.
> Ideally I'd like the already expanded rw pointer from wait() to be honored
> by mxm so that it doesn't re-expand the object three times but, like
> set_element, not at all.
>

My bad, sorry for the long confusing email, I figured out that I was
calling the wrong macro when getting my matrix datum and inadvertently
expanding RO pointers as well, I've fixed that issue, and everything is
working great!  No extra expansions and my support functions are working
well, I need to go through a few more places in the API to add more support
but otherwise the fixes Tom has put into plpgsql have worked perfectly and
the library now appears to be behaving optimally!  I can get down to doing
some benchmarks and head-to-head with the C and Python bindings to compare
against.

Thanks for your help Tom, I'm looking forward to the changes being released
soon!  In the meanwhile I'll keep a locally patched version for ongoing
testing purposes.

-Michel

>


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-18 20:22  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2024-12-18 20:22 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

Michel Pelletier <[email protected]> writes:
> My bad, sorry for the long confusing email, I figured out that I was
> calling the wrong macro when getting my matrix datum and inadvertently
> expanding RO pointers as well, I've fixed that issue, and everything is
> working great!  No extra expansions and my support functions are working
> well, I need to go through a few more places in the API to add more support
> but otherwise the fixes Tom has put into plpgsql have worked perfectly and
> the library now appears to be behaving optimally!  I can get down to doing
> some benchmarks and head-to-head with the C and Python bindings to compare
> against.

So, just to clarify where we're at: you are satisfied that the current
patch-set does what you need?

The other task we'd talked about was generalizing the existing
heuristics in exec_assign_value() and plpgsql_exec_function() that
say that array-type values should be forced into expanded R/W form
when being assigned to an array-type PL/pgSQL variable.  The argument
for that is that the PL/pgSQL function might subsequently do a lot of
subscripted accesses to the array (which'd benefit from working with
an expanded array) while never doing another assignment and thus not
having any opportunity to revisit the decision.  The counter-argument
is that it might *not* do such accesses, so that the expansion was
just a waste of cycles.  So this is squishy enough that I'd prefer to
have some solid use-cases to look at before trying to generalize it.

It's sounding to me like you're going to end up in a place where all
your values are passed around in expanded form already and so you have
little need for that optimization.  If so, I'd prefer not to go any
further than the present patch-set for now.  Adding "type support"
hooks as discussed would be a substantial amount of work, so I'd
like to have a more compelling case for it before doing that.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-23 03:52  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2024-12-23 03:52 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Wed, Dec 18, 2024 at 12:22 PM Tom Lane <[email protected]> wrote:

> Michel Pelletier <[email protected]> writes:
> > My bad, sorry for the long confusing email, I figured out that I was
> > calling the wrong macro when getting my matrix datum and inadvertently
> > expanding RO pointers as well, I've fixed that issue, and everything is
> > working great!  No extra expansions and my support functions are working
> > well, I need to go through a few more places in the API to add more
> support
> > but otherwise the fixes Tom has put into plpgsql have worked perfectly
> and
> > the library now appears to be behaving optimally!  I can get down to
> doing
> > some benchmarks and head-to-head with the C and Python bindings to
> compare
> > against.
>
> So, just to clarify where we're at: you are satisfied that the current
> patch-set does what you need?
>

I have some updates on this thread based on some graph algorithms I've
ported from the Python/C graphblas libraries.

All of the plpgsql expanded object optimizations so far are working well, I
can minimize object expansion in most cases, there are a couple I haven't
been able to work around but I'm still getting excellent benchmarking
numbers on some large test graphs:

                LiveJournal         Orkut
Nodes           3,997,962           3,072,441
Edges           34,681,185          117,185,037
Triangles       177,820,130         627,583,972

                Seconds Edges/S     Seconds Edges/S
Tri Count LL    2.80s   12,386,138  32.03s  3,658,602
Tri Count LU    1.91s   18,157,688  16.38s  7,156,338
Tri Centrality  1.55s   22,374,958  12.22s  9,589,610
Page Rank       8.10s   4,281,628   23.14s  5,064,176

That's on a 2020 era 4 core economy laptop and is in line with what the
C/Python/Julia bindings get on similar hardware.

There are a few cases where I have to force an expansion, I work around
this by calling a `wait()` function, which expands the datum, calls
GrB_wait() on it (a nop in this case) and returns a r/w pointer.  You can
see this in the following Triangle Counting function which is a matrix
multiplication of a graph to itself, using itself as a mask.  This matrix
reduces to the triangle count (times six):

create or replace function tcount_b(graph matrix) returns bigint language
plpgsql as
    $$
    begin
        graph = wait(graph);
        graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph,
descr=>'s');
        return reduce_scalar(graph) / 6;
    end;
    $$;

DEBUG:  new_matrix
DEBUG:  flatten_matrix
DEBUG:  matrix_wait
DEBUG:  expand_matrix  -- expansion happens here in wait()
DEBUG:  new_matrix
DEBUG:  matrix_mxm      -- mxm does not re-expand the object, good!
DEBUG:  expand_semiring
DEBUG:  new_semiring
DEBUG:  new_matrix
DEBUG:  expand_descriptor
DEBUG:  new_descriptor
DEBUG:  matrix_reduce_scalar  -- neither does reduce, good!
DEBUG:  new_scalar
DEBUG:  scalar_div_int32
DEBUG:  new_scalar
DEBUG:  cast_scalar_int64

If I take out the call to wait(), then mxm calls expand_matrix 3 times as
it did before your optimizations.

The other task we'd talked about was generalizing the existing
> heuristics in exec_assign_value() and plpgsql_exec_function() that
> say that array-type values should be forced into expanded R/W form
> when being assigned to an array-type PL/pgSQL variable.  The argument
> for that is that the PL/pgSQL function might subsequently do a lot of
> subscripted accesses to the array (which'd benefit from working with
> an expanded array) while never doing another assignment and thus not
> having any opportunity to revisit the decision.  The counter-argument
> is that it might *not* do such accesses, so that the expansion was
> just a waste of cycles.  So this is squishy enough that I'd prefer to
> have some solid use-cases to look at before trying to generalize it.
>
> It's sounding to me like you're going to end up in a place where all
> your values are passed around in expanded form already and so you have
> little need for that optimization.

  If so, I'd prefer not to go any
> further than the present patch-set for now.  Adding "type support"
> hooks as discussed would be a substantial amount of work, so I'd
> like to have a more compelling case for it before doing that.
>

I agree it makes sense to have more use cases before making deeper
changes.  I only work with expanded forms,  but need to call wait() to
pre-expand the object to avoid multiple expansions in functions that can
take the same object in multiple parameters.  This is a pretty common
pattern in GraphBLAS (and linear algebra in general) where (many) matrices
are commutable to themselves in several ways like multiplication,
element-wise operations, and element masking.

I'm not sure if eliminating wait() is a good enough use case, it would
definitely be nice to get rid of but I can document it pretty thoroughly
and it's relatively easy to catch.


-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-23 16:26  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2024-12-23 16:26 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

Michel Pelletier <[email protected]> writes:
> On Wed, Dec 18, 2024 at 12:22 PM Tom Lane <[email protected]> wrote:
>> So, just to clarify where we're at: you are satisfied that the current
>> patch-set does what you need?

> There are a few cases where I have to force an expansion, I work around
> this by calling a `wait()` function, which expands the datum, calls
> GrB_wait() on it (a nop in this case) and returns a r/w pointer.  You can
> see this in the following Triangle Counting function which is a matrix
> multiplication of a graph to itself, using itself as a mask.  This matrix
> reduces to the triangle count (times six):

> create or replace function tcount_b(graph matrix) returns bigint language
> plpgsql as
>     $$
>     begin
>         graph = wait(graph);
>         graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph,
> descr=>'s');
>         return reduce_scalar(graph) / 6;
>     end;
>     $$;

> ...
> I agree it makes sense to have more use cases before making deeper
> changes.  I only work with expanded forms,  but need to call wait() to
> pre-expand the object to avoid multiple expansions in functions that can
> take the same object in multiple parameters.

Hmm.  I agree that the wait() call is a bit ugly, but there are at
least two things that seem worth looking into before we go so far
as inventing type-support infrastructure:

1. Why isn't the incoming "graph" object already expanded?  It
often would be read-only, but that seems like it might be enough
given your description of GraphBLAS' behavior.

2. If the problem is primarily with passing the same object to
multiple parameters of a function, couldn't you detect and optimize
that within the function?  It would be messier than just blindly
applying DatumGetWhatever() to each parameter position; but with a
bit of thought I bet you could create some support logic that would
hide most of the mess.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-25 20:25  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2024-12-25 20:25 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Mon, Dec 23, 2024 at 8:26 AM Tom Lane <[email protected]> wrote:

> Michel Pelletier <[email protected]> writes:
> > ...
> > I agree it makes sense to have more use cases before making deeper
> > changes.  I only work with expanded forms,  but need to call wait() to
> > pre-expand the object to avoid multiple expansions in functions that can
> > take the same object in multiple parameters.
>
> Hmm.  I agree that the wait() call is a bit ugly, but there are at
> least two things that seem worth looking into before we go so far
> as inventing type-support infrastructure:
>

2. If the problem is primarily with passing the same object to
> multiple parameters of a function, couldn't you detect and optimize
> that within the function?  It would be messier than just blindly
> applying DatumGetWhatever() to each parameter position; but with a
> bit of thought I bet you could create some support logic that would
> hide most of the mess.
>

Ah that's a great idea, and it works beautifully!  Now I can do an
efficient triangle count without even needing a function, see below
expand_matrix is only called once:

postgres=# select reduce_scalar(mxm(graph, graph, mask=>graph, c=>graph)) /
6 as tcount from vlivejournals ;
DEBUG:  matrix_mxm
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix   -- only called once!
DEBUG:  new_matrix
DEBUG:  DatumGetMatrixMaybeA
DEBUG:  DatumGetMatrixMaybeAB
DEBUG:  DatumGetMatrixMaybeABC
DEBUG:  matrix_reduce_scalar
DEBUG:  DatumGetMatrix
DEBUG:  new_scalar
DEBUG:  scalar_div_int32
DEBUG:  new_scalar
DEBUG:  scalar_out
┌─────────────────┐
│     tcount      │
├─────────────────┤
│ int32:177820130 │
└─────────────────┘

What a wonderful Christmas present to me, thank you Tom!

That pretty much resolves my main issues.  I'm still in an exploratory
phase but I think this gets me pretty far.  Is this something that has to
wait for 18 to be released?  Also do you need any further testing or code
reviewing from me?

-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-12-27 00:32  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 0 replies; 34+ messages in thread

From: Tom Lane @ 2024-12-27 00:32 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

Michel Pelletier <[email protected]> writes:
> On Mon, Dec 23, 2024 at 8:26 AM Tom Lane <[email protected]> wrote:
>> 2. If the problem is primarily with passing the same object to
>> multiple parameters of a function, couldn't you detect and optimize
>> that within the function?

> Ah that's a great idea, and it works beautifully!  Now I can do an
> efficient triangle count without even needing a function, see below
> expand_matrix is only called once:

Nice!

> That pretty much resolves my main issues.  I'm still in an exploratory
> phase but I think this gets me pretty far.  Is this something that has to
> wait for 18 to be released?  Also do you need any further testing or code
> reviewing from me?

Yeah, this is not the kind of change we would back-patch, so it'll
have to wait for v18 (or later if people are slow to review it :-( ).

I don't think there's anything more we need from you, though of course
you should keep working on your code and report back if you hit
anything that needs further improvement on our end.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-04 16:37  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  2 siblings, 0 replies; 34+ messages in thread

From: Michel Pelletier @ 2025-01-04 16:37 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Tue, Nov 19, 2024 at 11:45 AM Tom Lane <[email protected]> wrote:

> Pavel Stehule <[email protected]> writes:
> > út 19. 11. 2024 v 18:51 odesílatel Michel Pelletier <
> > [email protected]> napsal:
> >> A couple years ago I tried to compress what I learned about expanded
> >> objects into a dummy extension that just provides the necessary
> >> boilerplate.  It wasn't great but a start:
> >> https://github.com/michelp/pgexpanded
> >> Pavel Stehule indicated this might be a good example to put into
> contrib:
>
> > another position can be src/test/modules - I think so your example is
> > "similar" to plsample
>
> Yeah.  I think we've largely adopted the position that contrib should
> contain installable modules that do something potentially useful to
> end-users.  A pure skeleton wouldn't be that, but if it's fleshed out
> enough to be test code for some core features then src/test/modules
> could be a reasonable home.
>

I've circled back on this task to do some work improving the skeleton code,
but going back through our thread I landed on this point Tom made about
usefulness vs pure skeleton and my natural desire is to make a simple
expanded object that is also useful, so I brainstormed a bit and decided to
try something relatively simple but also (IMO) quite useful, an expanded
datum that wraps sqlite's serialize/derserialize API:

https://github.com/michelp/postgres-sqlite

As crazy as this sounds there are some good use cases here, very easy to
stuff relational data into a completely isolated box without having to
worry about things like very granular RLS policies or other issues of
traditional postgres multi-tenancy.  Being wire compatible with sqlite-wasm
also means databases can be slurped right from postgres into a browser and
synced with no need to transform data back and forth.  Large chunks of
complex structured relational data can be wiped out with a simple row
deletion, and since sqlite can't escape from its box and has no scripting
ability, it makes a nice secure sandbox that even if users could corrupt
it, it would have minimal impact on Postgres.

It's only a bit more complicated than the pgexpanded skeleton and the
expanded datum bits are is their own separate C file so they can be studied
in isolation.  Based on the above comments, this seems something more
appropriate for contrib than test/modules, although I can see there may be
some understandable pushback about something so weird that also has an
external library dependency.

Any thoughts?  I want to nail down the core functionality before I go back
and clean up either case based on Tom review comments on the skeleton
module (most of which still apply since I used the skeleton to make it!)

-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-04 19:35  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  2 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-01-04 19:35 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

Michel Pelletier <[email protected]> writes:
> I've circled back on this task to do some work improving the skeleton code,
> but going back through our thread I landed on this point Tom made about
> usefulness vs pure skeleton and my natural desire is to make a simple
> expanded object that is also useful, so I brainstormed a bit and decided to
> try something relatively simple but also (IMO) quite useful, an expanded
> datum that wraps sqlite's serialize/derserialize API:
> https://github.com/michelp/postgres-sqlite

I think the odds that we'd accept a module with a dependency on sqlite
are negligible.  It's too big of a build dependency for too little
return.  Also, I'm sure that a module defined like that would be a
pretty poor example/starting point for other expanded-object
applications: there'd be too many aspects that have only to do with
interfacing to sqlite, making it hard to see the expanded-object
forest for the sqlite trees.

I have to admit though that the forest-v-trees aspect makes it fairly
hard to think of any suitable example module that would serve much
real-world purpose.  Likely scenarios for expanded objects just have
a lot of functionality in them.  For instance, I thought for a moment
of suggesting that teaching contrib/hstore to work with expanded
representations of hstores could be useful.  But I'd forgotten how
much functionality that type has.  It'd be a big project and would
still have a lot of baggage.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-04 20:34  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2025-01-04 20:34 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Sat, Jan 4, 2025 at 11:35 AM Tom Lane <[email protected]> wrote:

> Michel Pelletier <[email protected]> writes:
> > I've circled back on this task to do some work improving the skeleton
> code,
> > but going back through our thread I landed on this point Tom made about
> > usefulness vs pure skeleton and my natural desire is to make a simple
> > expanded object that is also useful, so I brainstormed a bit and decided
> to
> > try something relatively simple but also (IMO) quite useful, an expanded
> > datum that wraps sqlite's serialize/derserialize API:
> > https://github.com/michelp/postgres-sqlite
>
> I think the odds that we'd accept a module with a dependency on sqlite
> are negligible.  It's too big of a build dependency for too little
> return.


That's fair, I wasn't sure if contrib modules could have optional build
dependencies that just skip building that module if they are not
installed.  If this were the case for users who want to study the approach
they can look at the code, and users who want to use the feature can
install the dependency or maybe require a configuration flag like
--with-sqlite.


> Also, I'm sure that a module defined like that would be a
> pretty poor example/starting point for other expanded-object
> applications: there'd be too many aspects that have only to do with
> interfacing to sqlite, making it hard to see the expanded-object
> forest for the sqlite trees.
>

I don't agree it would be a poor example, there are really only two touch
points with sqlite that matter, the call to sqlite3_serialize in the
flattening function and sqlite3_deserialize in the expander and consist of
a simple pointer exchange and memcpy.  That code is in its own file,
separate from the exec/query/dump code which can be effectively ignored by
someone looking to understand the expansion life cycle.


> I have to admit though that the forest-v-trees aspect makes it fairly
> hard to think of any suitable example module that would serve much
> real-world purpose.  Likely scenarios for expanded objects just have
> a lot of functionality in them.


Agree that an expanded object is only useful if it provides functionality.
My original pgexpanded extension from way back provided a dumb dense matrix
to do matrix multiplication, but I trimmed it out to just be the counter as
posted earlier in this thread, and you in turn mentioned maybe it should
just be a malloc'ed string, but in either case the pointlessness of it
bothers me a bit so I was hoping to find something that just crosses the
line into useful while still being a really simply expanded example.

-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-15 18:09  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-01-15 18:09 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

I noticed that v2 of this patch series failed to apply after
7b27f5fd3, so here's v3.  No non-trivial changes.

			regards, tom lane



Attachments:

  [text/x-diff] v3-0001-Preliminary-refactoring.patch (9.8K, 2-v3-0001-Preliminary-refactoring.patch)
  download | inline diff:
From d82b50dc222fb8751f45875fb3627bf08ca2e0cf Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Wed, 15 Jan 2025 12:37:54 -0500
Subject: [PATCH v3 1/4] Preliminary refactoring.

This short and boring patch simply moves the responsibility for
initializing PLpgSQL_expr.target_param into plpgsql parsing,
rather than doing it at first execution of the expr as before.
This doesn't save anything in terms of runtime, since the work was
trivial and done only once per expr anyway.  But it makes the info
available during parsing, which will be useful for the next step.

Likewise set PLpgSQL_expr.func during parsing.  According to the
comments, this was once impossible; but it's certainly possible
since we invented the plpgsql_curr_compile variable.  Again, this
saves little runtime, but it seems far cleaner conceptually.

While at it, I reordered stuff in struct PLpgSQL_expr to make it
clearer which fields are filled when, and merged some duplicative
code in pl_gram.y.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_exec.c | 27 ---------------
 src/pl/plpgsql/src/pl_gram.y | 65 ++++++++++++++++++++++++------------
 src/pl/plpgsql/src/plpgsql.h | 31 +++++++++--------
 3 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index e5b0da04e3..0465a70b18 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4174,12 +4174,6 @@ exec_prepare_plan(PLpgSQL_execstate *estate,
 	SPIPlanPtr	plan;
 	SPIPrepareOptions options;
 
-	/*
-	 * The grammar can't conveniently set expr->func while building the parse
-	 * tree, so make sure it's set before parser hooks need it.
-	 */
-	expr->func = estate->func;
-
 	/*
 	 * Generate and save the plan
 	 */
@@ -5016,21 +5010,7 @@ exec_assign_expr(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 	 * If first time through, create a plan for this expression.
 	 */
 	if (expr->plan == NULL)
-	{
-		/*
-		 * Mark the expression as being an assignment source, if target is a
-		 * simple variable.  (This is a bit messy, but it seems cleaner than
-		 * modifying the API of exec_prepare_plan for the purpose.  We need to
-		 * stash the target dno into the expr anyway, so that it will be
-		 * available if we have to replan.)
-		 */
-		if (target->dtype == PLPGSQL_DTYPE_VAR)
-			expr->target_param = target->dno;
-		else
-			expr->target_param = -1;	/* should be that already */
-
 		exec_prepare_plan(estate, expr, 0);
-	}
 
 	value = exec_eval_expr(estate, expr, &isnull, &valtype, &valtypmod);
 	exec_assign_value(estate, target, value, isnull, valtype, valtypmod);
@@ -6282,13 +6262,6 @@ setup_param_list(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 		 * that they are interrupting an active use of parameters.
 		 */
 		paramLI->parserSetupArg = expr;
-
-		/*
-		 * Also make sure this is set before parser hooks need it.  There is
-		 * no need to save and restore, since the value is always correct once
-		 * set.  (Should be set already, but let's be sure.)
-		 */
-		expr->func = estate->func;
 	}
 	else
 	{
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 063ed81f05..7ff6b663e3 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -61,6 +61,10 @@ static	bool			tok_is_keyword(int token, union YYSTYPE *lval,
 static	void			word_is_not_variable(PLword *word, int location, yyscan_t yyscanner);
 static	void			cword_is_not_variable(PLcword *cword, int location, yyscan_t yyscanner);
 static	void			current_token_is_not_variable(int tok, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yyscan_t yyscanner);
+static	PLpgSQL_expr	*make_plpgsql_expr(const char *query,
+										   RawParseMode parsemode);
+static	void			expr_is_assignment_source(PLpgSQL_expr *expr,
+												  PLpgSQL_datum *target);
 static	PLpgSQL_expr	*read_sql_construct(int until,
 											int until2,
 											int until3,
@@ -535,6 +539,10 @@ decl_statement	: decl_varname decl_const decl_datatype decl_collate decl_notnull
 									 errmsg("variable \"%s\" must have a default value, since it's declared NOT NULL",
 											var->refname),
 									 parser_errposition(@5)));
+
+						if (var->default_val != NULL)
+							expr_is_assignment_source(var->default_val,
+													  (PLpgSQL_datum *) var);
 					}
 				| decl_varname K_ALIAS K_FOR decl_aliasitem ';'
 					{
@@ -995,6 +1003,7 @@ stmt_assign		: T_DATUM
 													   false, true,
 													   NULL, NULL,
 													   &yylval, &yylloc, yyscanner);
+						expr_is_assignment_source(new->expr, $1.datum);
 
 						$$ = (PLpgSQL_stmt *) new;
 					}
@@ -2650,6 +2659,38 @@ current_token_is_not_variable(int tok, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yysca
 		yyerror(yyllocp, yyscanner, "syntax error");
 }
 
+/* Convenience routine to construct a PLpgSQL_expr struct */
+static PLpgSQL_expr *
+make_plpgsql_expr(const char *query,
+				  RawParseMode parsemode)
+{
+	PLpgSQL_expr *expr = palloc0(sizeof(PLpgSQL_expr));
+
+	expr->query = pstrdup(query);
+	expr->parseMode = parsemode;
+	expr->func = plpgsql_curr_compile;
+	expr->ns = plpgsql_ns_top();
+	/* might get changed later during parsing: */
+	expr->target_param = -1;
+	/* other fields are left as zeroes until first execution */
+	return expr;
+}
+
+/* Mark a PLpgSQL_expr as being the source of an assignment to target */
+static void
+expr_is_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
+{
+	/*
+	 * Mark the expression as being an assignment source, if target is a
+	 * simple variable.  We don't currently support optimized assignments to
+	 * other DTYPEs.
+	 */
+	if (target->dtype == PLPGSQL_DTYPE_VAR)
+		expr->target_param = target->dno;
+	else
+		expr->target_param = -1;	/* should be that already */
+}
+
 /* Convenience routine to read an expression with one possible terminator */
 static PLpgSQL_expr *
 read_sql_expression(int until, const char *expected, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yyscan_t yyscanner)
@@ -2793,13 +2834,7 @@ read_sql_construct(int until,
 	 */
 	plpgsql_append_source_text(&ds, startlocation, endlocation, yyscanner);
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = parsemode;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, parsemode);
 	pfree(ds.data);
 
 	if (valid_sql)
@@ -3121,13 +3156,7 @@ make_execsql_stmt(int firsttoken, int location, PLword *word, YYSTYPE *yylvalp,
 	while (ds.len > 0 && scanner_isspace(ds.data[ds.len - 1]))
 		ds.data[--ds.len] = '\0';
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_DEFAULT;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_DEFAULT);
 	pfree(ds.data);
 
 	check_sql_expr(expr->query, expr->parseMode, location, yyscanner);
@@ -4005,13 +4034,7 @@ read_cursor_args(PLpgSQL_var *cursor, int until, YYSTYPE *yylvalp, YYLTYPE *yyll
 			appendStringInfoString(&ds, ", ");
 	}
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_PLPGSQL_EXPR;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_PLPGSQL_EXPR);
 	pfree(ds.data);
 
 	/* Next we'd better find the until token */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index c3ce4161a3..67fdfb3141 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -219,14 +219,22 @@ typedef struct PLpgSQL_expr
 {
 	char	   *query;			/* query string, verbatim from function body */
 	RawParseMode parseMode;		/* raw_parser() mode to use */
-	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
-	Bitmapset  *paramnos;		/* all dnos referenced by this query */
+	struct PLpgSQL_function *func;	/* function containing this expr */
+	struct PLpgSQL_nsitem *ns;	/* namespace chain visible to this expr */
 
-	/* function containing this expr (not set until we first parse query) */
-	struct PLpgSQL_function *func;
+	/*
+	 * These fields are used to help optimize assignments to expanded-datum
+	 * variables.  If this expression is the source of an assignment to a
+	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 */
+	int			target_param;	/* dno of assign target, or -1 if none */
 
-	/* namespace chain visible to this expr */
-	struct PLpgSQL_nsitem *ns;
+	/*
+	 * Fields above are set during plpgsql parsing.  Remaining fields are left
+	 * as zeroes/NULLs until we first parse/plan the query.
+	 */
+	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
+	Bitmapset  *paramnos;		/* all dnos referenced by this query */
 
 	/* fields for "simple expression" fast-path execution: */
 	Expr	   *expr_simple_expr;	/* NULL means not a simple expr */
@@ -235,14 +243,11 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * These fields are used to optimize assignments to expanded-datum
-	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno; else it's -1.
-	 * If we match a Param within expr_simple_expr to such a variable, that
-	 * Param's address is stored in expr_rw_param; then expression code
-	 * generation will allow the value for that Param to be passed read/write.
+	 * If we match a Param within expr_simple_expr to the variable identified
+	 * by target_param, that Param's address is stored in expr_rw_param; then
+	 * expression code generation will allow the value for that Param to be
+	 * passed as a read/write expanded-object pointer.
 	 */
-	int			target_param;	/* dno of assign target, or -1 if none */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
-- 
2.43.5



  [text/x-diff] v3-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch (19.3K, 3-v3-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch)
  download | inline diff:
From 944779537c256179747cd1cf77a11c8a88cf57db Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Wed, 15 Jan 2025 12:39:21 -0500
Subject: [PATCH v3 2/4] Detect whether plpgsql assignment targets are "local"
 variables.

Mark whether the target of a potentially optimizable assignment
is "local", in the sense of being declared inside any exception
block that could trap an error thrown from the assignment.
(This implies that we needn't preserve the variable's value
in case of an error.)

Normally, this requires a post-parsing scan of the function's
parse tree, since we don't know while parsing a BEGIN ...
construct whether we will find EXCEPTION at its end.  However,
if there are no BEGIN ... EXCEPTION blocks in the function at
all, then all assignments are local, even those to variables
representing function arguments.  We optimize that common case
by initializing the target_is_local flags to "true", and fixing
them up with a post-scan only if we found EXCEPTION.

The scan is implemented by code that's largely copied-and-pasted
from the nearby code to scan a plpgsql parse tree for deletion.
It's a bit annoying to have three copies of that now, but I'm
not seeing a way to refactor it that would save much code on net.

Note that variables' default-value expressions are never interesting
for expanded-variable optimization, since they couldn't contain a
reference to the target variable anyway.  But the code is set up
to compute their target_param and target_is_local correctly anyway,
for consistency and in case someone thinks of a use for that data.

I added a bit of plpgsql_dumptree support to help verify that
this code sets the flags as expected.  I'm not set on keeping
that, but I do want to keep the addition of a plpgsql_dumptree
call in plpgsql_compile_inline.  It's at best an oversight that
"#option dump" doesn't work in a DO block.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_comp.c  |  12 +
 src/pl/plpgsql/src/pl_funcs.c | 398 ++++++++++++++++++++++++++++++++++
 src/pl/plpgsql/src/pl_gram.y  |  15 ++
 src/pl/plpgsql/src/plpgsql.h  |   7 +-
 4 files changed, 431 insertions(+), 1 deletion(-)

diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 9dc8218292..56b899693b 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -373,6 +373,7 @@ do_compile(FunctionCallInfo fcinfo,
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	/*
 	 * Initialize the compiler, particularly the namespace stack.  The
@@ -814,6 +815,9 @@ do_compile(FunctionCallInfo fcinfo,
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
 	/* Debug dump for completed functions */
 	if (plpgsql_DumpExecTree)
 		plpgsql_dumptree(function);
@@ -909,6 +913,7 @@ plpgsql_compile_inline(char *proc_source)
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	plpgsql_ns_init();
 	plpgsql_ns_push(func_name, PLPGSQL_LABEL_BLOCK);
@@ -966,6 +971,13 @@ plpgsql_compile_inline(char *proc_source)
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
+	/* Debug dump for completed functions */
+	if (plpgsql_DumpExecTree)
+		plpgsql_dumptree(function);
+
 	/*
 	 * Pop the error context stack
 	 */
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index 8c827fe5cc..549e5d9292 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -333,6 +333,401 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 }
 
 
+/**********************************************************************
+ * Mark assignment source expressions that have local target variables,
+ * that is, variables declared within the exception block most closely
+ * containing the assignment itself.  (Such target variables need not be
+ * preserved if the assignment's source expression raises an error,
+ * allowing better optimization.)
+ *
+ * This code need not be called if the plpgsql function contains no exception
+ * blocks, because expr_is_assignment_source() will have set all the flags
+ * to true already.  Also, we need not examine default-value expressions for
+ * variables, because variable declarations are necessarily within the nearest
+ * exception block.  (In DECLARE ... BEGIN ... EXCEPTION ... END, the variable
+ * initializations are done before entering the exception scope.)  So it's
+ * sufficient to find assignment statements.
+ *
+ * Within the recursion, local_dnos is a Bitmapset of dnos of variables
+ * known to be declared within the current exception level.
+ **********************************************************************/
+static void mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos);
+static void mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos);
+static void mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos);
+static void mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos);
+static void mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos);
+static void mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos);
+static void mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos);
+static void mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos);
+static void mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos);
+static void mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos);
+static void mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos);
+static void mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos);
+static void mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos);
+static void mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos);
+static void mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos);
+static void mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos);
+static void mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos);
+static void mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos);
+static void mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos);
+static void mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos);
+static void mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos);
+static void mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos);
+static void mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos);
+static void mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos);
+static void mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos);
+static void mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos);
+static void mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos);
+static void mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos);
+
+
+static void
+mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos)
+{
+	switch (stmt->cmd_type)
+	{
+		case PLPGSQL_STMT_BLOCK:
+			mark_block((PLpgSQL_stmt_block *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSIGN:
+			mark_assign((PLpgSQL_stmt_assign *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_IF:
+			mark_if((PLpgSQL_stmt_if *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CASE:
+			mark_case((PLpgSQL_stmt_case *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_LOOP:
+			mark_loop((PLpgSQL_stmt_loop *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_WHILE:
+			mark_while((PLpgSQL_stmt_while *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORI:
+			mark_fori((PLpgSQL_stmt_fori *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORS:
+			mark_fors((PLpgSQL_stmt_fors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORC:
+			mark_forc((PLpgSQL_stmt_forc *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FOREACH_A:
+			mark_foreach_a((PLpgSQL_stmt_foreach_a *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXIT:
+			mark_exit((PLpgSQL_stmt_exit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN:
+			mark_return((PLpgSQL_stmt_return *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_NEXT:
+			mark_return_next((PLpgSQL_stmt_return_next *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_QUERY:
+			mark_return_query((PLpgSQL_stmt_return_query *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RAISE:
+			mark_raise((PLpgSQL_stmt_raise *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSERT:
+			mark_assert((PLpgSQL_stmt_assert *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXECSQL:
+			mark_execsql((PLpgSQL_stmt_execsql *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNEXECUTE:
+			mark_dynexecute((PLpgSQL_stmt_dynexecute *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNFORS:
+			mark_dynfors((PLpgSQL_stmt_dynfors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_GETDIAG:
+			mark_getdiag((PLpgSQL_stmt_getdiag *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_OPEN:
+			mark_open((PLpgSQL_stmt_open *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FETCH:
+			mark_fetch((PLpgSQL_stmt_fetch *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CLOSE:
+			mark_close((PLpgSQL_stmt_close *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_PERFORM:
+			mark_perform((PLpgSQL_stmt_perform *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CALL:
+			mark_call((PLpgSQL_stmt_call *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_COMMIT:
+			mark_commit((PLpgSQL_stmt_commit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ROLLBACK:
+			mark_rollback((PLpgSQL_stmt_rollback *) stmt, local_dnos);
+			break;
+		default:
+			elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type);
+			break;
+	}
+}
+
+static void
+mark_stmts(List *stmts, Bitmapset *local_dnos)
+{
+	ListCell   *s;
+
+	foreach(s, stmts)
+	{
+		mark_stmt((PLpgSQL_stmt *) lfirst(s), local_dnos);
+	}
+}
+
+static void
+mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos)
+{
+	if (block->exceptions)
+	{
+		ListCell   *e;
+
+		/*
+		 * The block creates a new exception scope, so variables declared at
+		 * outer levels are nonlocal.  For that matter, so are any variables
+		 * declared in the block's DECLARE section.  Hence, we must pass down
+		 * empty local_dnos.
+		 */
+		mark_stmts(block->body, NULL);
+
+		foreach(e, block->exceptions->exc_list)
+		{
+			PLpgSQL_exception *exc = (PLpgSQL_exception *) lfirst(e);
+
+			mark_stmts(exc->action, NULL);
+		}
+	}
+	else
+	{
+		/*
+		 * Otherwise, the block does not create a new exception scope, and any
+		 * variables it declares can also be considered local within it.  Note
+		 * that only initializable datum types (VAR, REC) are included in
+		 * initvarnos; but that's sufficient for our purposes.
+		 */
+		local_dnos = bms_copy(local_dnos);
+		for (int i = 0; i < block->n_initvars; i++)
+			local_dnos = bms_add_member(local_dnos, block->initvarnos[i]);
+		mark_stmts(block->body, local_dnos);
+		bms_free(local_dnos);
+	}
+}
+
+static void
+mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos)
+{
+	PLpgSQL_expr *expr = stmt->expr;
+
+	/*
+	 * If the assignment target is a plain DTYPE_VAR datum, mark it as local
+	 * or not.  (If it's not a VAR, we don't care.)
+	 */
+	if (expr->target_param >= 0)
+		expr->target_is_local = bms_is_member(expr->target_param, local_dnos);
+}
+
+static void
+mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->then_body, local_dnos);
+	foreach(l, stmt->elsif_list)
+	{
+		PLpgSQL_if_elsif *elif = (PLpgSQL_if_elsif *) lfirst(l);
+
+		/* elif->cond cannot be an assignment source */
+		mark_stmts(elif->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_body, local_dnos);
+}
+
+static void
+mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->t_expr cannot be an assignment source */
+	foreach(l, stmt->case_when_list)
+	{
+		PLpgSQL_case_when *cwt = (PLpgSQL_case_when *) lfirst(l);
+
+		/* cwt->expr cannot be an assignment source */
+		mark_stmts(cwt->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_stmts, local_dnos);
+}
+
+static void
+mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->lower, upper, step cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+}
+
+static void
+mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->argquery cannot be an assignment source */
+}
+
+static void
+mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->argquery, query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+}
+
+static void
+mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->params cannot contain an assignment source */
+	/* stmt->options cannot contain an assignment source */
+}
+
+static void
+mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond, message cannot be an assignment source */
+}
+
+static void
+mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->sqlstmt cannot be an assignment source */
+}
+
+static void
+mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos)
+{
+}
+
+void
+plpgsql_mark_local_assignment_targets(PLpgSQL_function *func)
+{
+	Bitmapset  *local_dnos;
+
+	/* Function parameters can be treated as local targets at outer level */
+	local_dnos = NULL;
+	for (int i = 0; i < func->fn_nargs; i++)
+		local_dnos = bms_add_member(local_dnos, func->fn_argvarnos[i]);
+	if (func->action)
+		mark_block(func->action, local_dnos);
+	bms_free(local_dnos);
+}
+
+
 /**********************************************************************
  * Release memory when a PL/pgSQL function is no longer needed
  *
@@ -1594,6 +1989,9 @@ static void
 dump_expr(PLpgSQL_expr *expr)
 {
 	printf("'%s'", expr->query);
+	if (expr->target_param >= 0)
+		printf(" target %d%s", expr->target_param,
+			   expr->target_is_local ? " (local)" : "");
 }
 
 void
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 7ff6b663e3..2426ca4a04 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -2327,6 +2327,8 @@ exception_sect	:
 						PLpgSQL_exception_block *new = palloc(sizeof(PLpgSQL_exception_block));
 						PLpgSQL_variable *var;
 
+						plpgsql_curr_compile->has_exception_block = true;
+
 						var = plpgsql_build_variable("sqlstate", lineno,
 													 plpgsql_build_datatype(TEXTOID,
 																			-1,
@@ -2672,6 +2674,7 @@ make_plpgsql_expr(const char *query,
 	expr->ns = plpgsql_ns_top();
 	/* might get changed later during parsing: */
 	expr->target_param = -1;
+	expr->target_is_local = false;
 	/* other fields are left as zeroes until first execution */
 	return expr;
 }
@@ -2686,9 +2689,21 @@ expr_is_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
 	 * other DTYPEs.
 	 */
 	if (target->dtype == PLPGSQL_DTYPE_VAR)
+	{
 		expr->target_param = target->dno;
+
+		/*
+		 * For now, assume the target is local to the nearest enclosing
+		 * exception block.  That's correct if the function contains no
+		 * exception blocks; otherwise we'll update this later.
+		 */
+		expr->target_is_local = true;
+	}
 	else
+	{
 		expr->target_param = -1;	/* should be that already */
+		expr->target_is_local = false; /* ditto */
+	}
 }
 
 /* Convenience routine to read an expression with one possible terminator */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 67fdfb3141..762af78a5e 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -225,9 +225,12 @@ typedef struct PLpgSQL_expr
 	/*
 	 * These fields are used to help optimize assignments to expanded-datum
 	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 * simple variable, target_param holds that variable's dno (else it's -1),
+	 * and target_is_local indicates whether the target is declared inside the
+	 * closest exception block containing the assignment.
 	 */
 	int			target_param;	/* dno of assign target, or -1 if none */
+	bool		target_is_local;	/* is it within nearest exception block? */
 
 	/*
 	 * Fields above are set during plpgsql parsing.  Remaining fields are left
@@ -1014,6 +1017,7 @@ typedef struct PLpgSQL_function
 	/* data derived while parsing body */
 	unsigned int nstatements;	/* counter for assigning stmtids */
 	bool		requires_procedure_resowner;	/* contains CALL or DO? */
+	bool		has_exception_block;	/* contains BEGIN...EXCEPTION? */
 
 	/* these fields change when the function is used */
 	struct PLpgSQL_execstate *cur_estate;
@@ -1314,6 +1318,7 @@ extern PLpgSQL_nsitem *plpgsql_ns_find_nearest_loop(PLpgSQL_nsitem *ns_cur);
  */
 extern PGDLLEXPORT const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
 extern const char *plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind);
+extern void plpgsql_mark_local_assignment_targets(PLpgSQL_function *func);
 extern void plpgsql_free_function_memory(PLpgSQL_function *func);
 extern void plpgsql_dumptree(PLpgSQL_function *func);
 
-- 
2.43.5



  [text/x-diff] v3-0003-Implement-new-optimization-rule-for-updates-of-ex.patch (26.3K, 4-v3-0003-Implement-new-optimization-rule-for-updates-of-ex.patch)
  download | inline diff:
From 102d4da7637c1c0f34c7c6c71777a5cedf70258a Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Wed, 15 Jan 2025 12:42:04 -0500
Subject: [PATCH v3 3/4] Implement new optimization rule for updates of
 expanded variables.

If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/include/executor/execExpr.h               |   1 +
 src/pl/plpgsql/src/expected/plpgsql_array.out |   9 +
 src/pl/plpgsql/src/pl_exec.c                  | 376 +++++++++++++++---
 src/pl/plpgsql/src/plpgsql.h                  |  22 +-
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |   9 +
 src/tools/pgindent/typedefs.list              |   2 +
 6 files changed, 357 insertions(+), 62 deletions(-)

diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 1e42c13178..19322fa945 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -406,6 +406,7 @@ typedef struct ExprEvalStep
 		{
 			ExecEvalSubroutine paramfunc;	/* add-on evaluation subroutine */
 			void	   *paramarg;	/* private data for same */
+			void	   *paramarg2;	/* more private data for same */
 			int			paramid;	/* numeric ID for parameter */
 			Oid			paramtype;	/* OID of parameter's datatype */
 		}			cparam;
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index ad60e0e8be..e5db6d6087 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -52,6 +52,15 @@ NOTICE:  a = ("{""(,11)""}",), a.c1[1].i = 11
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 NOTICE:  a = {1,2,3}
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 0465a70b18..7c33c49e65 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -251,6 +251,15 @@ static HTAB *shared_cast_hash = NULL;
 	else \
 		Assert(rc == PLPGSQL_RC_OK)
 
+/* State struct for count_param_references */
+typedef struct count_param_references_context
+{
+	int			paramid;
+	int			count;
+	Param	   *last_param;
+} count_param_references_context;
+
+
 /************************************************************
  * Local function forward declarations
  ************************************************************/
@@ -336,7 +345,9 @@ static void exec_prepare_plan(PLpgSQL_execstate *estate,
 static void exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr);
 static bool exec_is_simple_query(PLpgSQL_expr *expr);
 static void exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan);
-static void exec_check_rw_parameter(PLpgSQL_expr *expr);
+static void exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid);
+static bool count_param_references(Node *node,
+								   count_param_references_context *context);
 static void exec_check_assignable(PLpgSQL_execstate *estate, int dno);
 static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
 								  PLpgSQL_expr *expr,
@@ -384,6 +395,10 @@ static ParamExternData *plpgsql_param_fetch(ParamListInfo params,
 static void plpgsql_param_compile(ParamListInfo params, Param *param,
 								  ExprState *state,
 								  Datum *resv, bool *resnull);
+static void plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+										 ExprContext *econtext);
+static void plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+											ExprContext *econtext);
 static void plpgsql_param_eval_var(ExprState *state, ExprEvalStep *op,
 								   ExprContext *econtext);
 static void plpgsql_param_eval_var_ro(ExprState *state, ExprEvalStep *op,
@@ -6078,10 +6093,13 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 
 		/*
 		 * Reset to "not simple" to leave sane state (with no dangling
-		 * pointers) in case we fail while replanning.  expr_simple_plansource
-		 * can be left alone however, as that cannot move.
+		 * pointers) in case we fail while replanning.  We'll need to
+		 * re-determine simplicity and R/W optimizability anyway, since those
+		 * could change with the new plan.  expr_simple_plansource can be left
+		 * alone however, as that cannot move.
 		 */
 		expr->expr_simple_expr = NULL;
+		expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 		expr->expr_rw_param = NULL;
 		expr->expr_simple_plan = NULL;
 		expr->expr_simple_plan_lxid = InvalidLocalTransactionId;
@@ -6439,16 +6457,27 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	scratch.resnull = resnull;
 
 	/*
-	 * Select appropriate eval function.  It seems worth special-casing
-	 * DTYPE_VAR and DTYPE_RECFIELD for performance.  Also, we can determine
-	 * in advance whether MakeExpandedObjectReadOnly() will be required.
-	 * Currently, only VAR/PROMISE and REC datums could contain read/write
-	 * expanded objects.
+	 * Select appropriate eval function.
+	 *
+	 * First, if this Param references the same varlena-type DTYPE_VAR datum
+	 * that is the target of the assignment containing this simple expression,
+	 * then it's possible we will be able to optimize handling of R/W expanded
+	 * datums.  We don't want to do the work needed to determine that unless
+	 * we actually see a R/W expanded datum at runtime, so install a checking
+	 * function that will figure that out when needed.
+	 *
+	 * Otherwise, it seems worth special-casing DTYPE_VAR and DTYPE_RECFIELD
+	 * for performance.  Also, we can determine in advance whether
+	 * MakeExpandedObjectReadOnly() will be required.  Currently, only
+	 * VAR/PROMISE and REC datums could contain read/write expanded objects.
 	 */
 	if (datum->dtype == PLPGSQL_DTYPE_VAR)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		bool		isvarlena = (((PLpgSQL_var *) datum)->datatype->typlen == -1);
+
+		if (isvarlena && dno == expr->target_param && expr->expr_simple_expr)
+			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_check;
+		else if (isvarlena)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var;
@@ -6457,14 +6486,12 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_recfield;
 	else if (datum->dtype == PLPGSQL_DTYPE_PROMISE)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		if (((PLpgSQL_var *) datum)->datatype->typlen == -1)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
 	}
-	else if (datum->dtype == PLPGSQL_DTYPE_REC &&
-			 param != expr->expr_rw_param)
+	else if (datum->dtype == PLPGSQL_DTYPE_REC)
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 	else
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
@@ -6473,14 +6500,170 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	 * Note: it's tempting to use paramarg to store the estate pointer and
 	 * thereby save an indirection or two in the eval functions.  But that
 	 * doesn't work because the compiled expression might be used with
-	 * different estates for the same PL/pgSQL function.
+	 * different estates for the same PL/pgSQL function.  Instead, store
+	 * pointers to the PLpgSQL_expr as well as this specific Param, to support
+	 * plpgsql_param_eval_var_check().
 	 */
-	scratch.d.cparam.paramarg = NULL;
+	scratch.d.cparam.paramarg = expr;
+	scratch.d.cparam.paramarg2 = param;
 	scratch.d.cparam.paramid = param->paramid;
 	scratch.d.cparam.paramtype = param->paramtype;
 	ExprEvalPushStep(state, &scratch);
 }
 
+/*
+ * plpgsql_param_eval_var_check		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we may need to determine the applicability of a read/write optimization,
+ * but we've not done that yet.
+ */
+static void
+plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+							 ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, it's time to
+	 * decide whether/how to optimize the assignment.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		PLpgSQL_expr *expr = (PLpgSQL_expr *) op->d.cparam.paramarg;
+		Param	   *param = (Param *) op->d.cparam.paramarg2;
+
+		/*
+		 * We might have already figured this out while evaluating some other
+		 * Param referencing the same variable.
+		 */
+		if (expr->expr_rwopt == PLPGSQL_RWOPT_UNKNOWN)
+			exec_check_rw_parameter(expr, op->d.cparam.paramid);
+
+		/*
+		 * Update the callback pointer to match what we decided to do, and
+		 * pass off this execution to the selected function.
+		 */
+		switch (expr->expr_rwopt)
+		{
+			case PLPGSQL_RWOPT_UNKNOWN:
+				Assert(false);
+				break;
+			case PLPGSQL_RWOPT_NOPE:
+				/* Force the value to read-only in all future executions */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+				plpgsql_param_eval_var_ro(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_TRANSFER:
+				/* There can be only one matching Param in this case */
+				Assert(param == expr->expr_rw_param);
+				/* When the value is read/write, transfer to exec context */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_transfer;
+				plpgsql_param_eval_var_transfer(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_INPLACE:
+				if (param == expr->expr_rw_param)
+				{
+					/* When the value is read/write, deliver it as-is */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var;
+					plpgsql_param_eval_var(state, op, econtext);
+				}
+				else
+				{
+					/* Not the optimizable reference, so force to read-only */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+					plpgsql_param_eval_var_ro(state, op, econtext);
+				}
+				break;
+		}
+		return;
+	}
+
+	/*
+	 * Otherwise, continue to postpone that decision, and execute an inlined
+	 * version of exec_eval_datum().  Although this value could potentially
+	 * need MakeExpandedObjectReadOnly, we know it doesn't right now.
+	 */
+	*op->resvalue = var->value;
+	*op->resnull = var->isnull;
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
+/*
+ * plpgsql_param_eval_var_transfer		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we have determined that a read/write expanded value can be handed off
+ * into execution of the expression (and then possibly returned to our
+ * function's ownership afterwards).  We have to test though, because the
+ * variable might not contain a read/write expanded value during this
+ * execution.
+ */
+static void
+plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+								ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, transfer its
+	 * ownership into the expression execution context, then drop our own
+	 * reference to the value by setting the variable to NULL.  That'll be
+	 * overwritten (perhaps with this same object) when control comes back
+	 * from the expression.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		*op->resvalue = TransferExpandedObject(var->value,
+											   get_eval_mcontext(estate));
+		*op->resnull = false;
+
+		var->value = (Datum) 0;
+		var->isnull = true;
+		var->freeval = false;
+	}
+	else
+	{
+		/*
+		 * Otherwise we can pass the variable's value directly; we now know
+		 * that MakeExpandedObjectReadOnly isn't needed.
+		 */
+		*op->resvalue = var->value;
+		*op->resnull = var->isnull;
+	}
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
 /*
  * plpgsql_param_eval_var		evaluation of EEOP_PARAM_CALLBACK step
  *
@@ -7957,9 +8140,10 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 	MemoryContext oldcontext;
 
 	/*
-	 * Initialize to "not simple".
+	 * Initialize to "not simple", and reset R/W optimizability.
 	 */
 	expr->expr_simple_expr = NULL;
+	expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 	expr->expr_rw_param = NULL;
 
 	/*
@@ -8164,88 +8348,133 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
 	expr->expr_simple_typmod = exprTypmod((Node *) tle_expr);
 	/* We also want to remember if it is immutable or not */
 	expr->expr_simple_mutable = contain_mutable_functions((Node *) tle_expr);
-
-	/*
-	 * Lastly, check to see if there's a possibility of optimizing a
-	 * read/write parameter.
-	 */
-	exec_check_rw_parameter(expr);
 }
 
 /*
  * exec_check_rw_parameter --- can we pass expanded object as read/write param?
  *
- * If we have an assignment like "x := array_append(x, foo)" in which the
+ * There are two separate cases in which we can optimize an update to a
+ * variable that has a read/write expanded value by letting the called
+ * expression operate directly on the expanded value.  In both cases we
+ * are considering assignments like "var := array_append(var, foo)" where
+ * the assignment target is also an input to the RHS expression.
+ *
+ * Case 1 (RWOPT_TRANSFER rule): if the variable is "local" in the sense that
+ * its declaration is not outside any BEGIN...EXCEPTION block surrounding the
+ * assignment, then we do not need to worry about preserving its value if the
+ * RHS expression throws an error.  If in addition the variable is referenced
+ * exactly once in the RHS expression, then we can optimize by converting the
+ * read/write expanded value into a transient value within the expression
+ * evaluation context, and then setting the variable's recorded value to NULL
+ * to prevent double-free attempts.  This works regardless of any other
+ * details of the RHS expression.  If the expression eventually returns that
+ * same expanded object (possibly modified) then the variable will re-acquire
+ * ownership; while if it returns something else or throws an error, the
+ * expanded object will be discarded as part of cleanup of the evaluation
+ * context.
+ *
+ * Case 2 (RWOPT_INPLACE rule): if we have a non-local assignment or if
+ * it looks like "var := array_append(var, var[1])" with multiple references
+ * to the target variable, then we can't use case 1.  Nonetheless, if the
  * top-level function is trusted not to corrupt its argument in case of an
- * error, then when x has an expanded object as value, it is safe to pass the
- * value as a read/write pointer and let the function modify the value
- * in-place.
+ * error, then when the var has an expanded object as value, it is safe to
+ * pass the value as a read/write pointer to the top-level function and let
+ * the function modify the value in-place.  (Any other references have to be
+ * passed as read-only pointers as usual.)  Only the top-level function has to
+ * be trusted, since if anything further down fails, the object hasn't been
+ * modified yet.
  *
- * This function checks for a safe expression, and sets expr->expr_rw_param
- * to the address of any Param within the expression that can be passed as
- * read/write (there can be only one); or to NULL when there is no safe Param.
+ * This function checks to see if the assignment is optimizable according
+ * to either rule, and updates expr->expr_rwopt accordingly.  In addition,
+ * it sets expr->expr_rw_param to the address of the Param within the
+ * expression that can be passed as read/write (there can be only one);
+ * or to NULL when there is no safe Param.
  *
- * Note that this mechanism intentionally applies the safety labeling to just
- * one Param; the expression could contain other Params referencing the target
- * variable, but those must still be treated as read-only.
+ * Note that this mechanism intentionally allows just one Param to emit a
+ * read/write pointer; in case 2, the expression could contain other Params
+ * referencing the target variable, but those must be treated as read-only.
  *
  * Also note that we only apply this optimization within simple expressions.
  * There's no point in it for non-simple expressions, because the
  * exec_run_select code path will flatten any expanded result anyway.
- * Also, it's safe to assume that an expr_simple_expr tree won't get copied
- * somewhere before it gets compiled, so that looking for pointer equality
- * to expr_rw_param will work for matching the target Param.  That'd be much
- * shakier in the general case.
  */
 static void
-exec_check_rw_parameter(PLpgSQL_expr *expr)
+exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 {
-	int			target_dno;
+	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
 	ListCell   *lc;
 
 	/* Assume unsafe */
+	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
 	expr->expr_rw_param = NULL;
 
-	/* Done if expression isn't an assignment source */
-	target_dno = expr->target_param;
-	if (target_dno < 0)
-		return;
+	/* Shouldn't be here for non-simple expression */
+	Assert(sexpr != NULL);
+
+	/* Param should match the expression's assignment target, too */
+	Assert(paramid == expr->target_param + 1);
 
 	/*
-	 * If target variable isn't referenced by expression, no need to look
-	 * further.
+	 * If the assignment is to a "local" variable (one whose value won't
+	 * matter anymore if expression evaluation fails), and this Param is the
+	 * only reference to that variable in the expression, then we can
+	 * unconditionally optimize using the "transfer" method.
 	 */
-	if (!bms_is_member(target_dno, expr->paramnos))
-		return;
+	if (expr->target_is_local)
+	{
+		count_param_references_context context;
 
-	/* Shouldn't be here for non-simple expression */
-	Assert(expr->expr_simple_expr != NULL);
+		/* See how many references there are, and find one of them */
+		context.paramid = paramid;
+		context.count = 0;
+		context.last_param = NULL;
+		(void) count_param_references((Node *) sexpr, &context);
+
+		/* If we're here, the expr must contain some reference to the var */
+		Assert(context.count > 0);
+
+		/* If exactly one reference, success! */
+		if (context.count == 1)
+		{
+			expr->expr_rwopt = PLPGSQL_RWOPT_TRANSFER;
+			expr->expr_rw_param = context.last_param;
+			return;
+		}
+	}
 
 	/*
+	 * Otherwise, see if we can trust the expression's top-level function to
+	 * apply the "inplace" method.
+	 *
 	 * Top level of expression must be a simple FuncExpr, OpExpr, or
-	 * SubscriptingRef, else we can't optimize.
+	 * SubscriptingRef, else we can't identify which function is relevant. But
+	 * it's okay to look through any RelabelType above that, since that can't
+	 * fail.
 	 */
-	if (IsA(expr->expr_simple_expr, FuncExpr))
+	if (IsA(sexpr, RelabelType))
+		sexpr = ((RelabelType *) sexpr)->arg;
+	if (IsA(sexpr, FuncExpr))
 	{
-		FuncExpr   *fexpr = (FuncExpr *) expr->expr_simple_expr;
+		FuncExpr   *fexpr = (FuncExpr *) sexpr;
 
 		funcid = fexpr->funcid;
 		fargs = fexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, OpExpr))
+	else if (IsA(sexpr, OpExpr))
 	{
-		OpExpr	   *opexpr = (OpExpr *) expr->expr_simple_expr;
+		OpExpr	   *opexpr = (OpExpr *) sexpr;
 
 		funcid = opexpr->opfuncid;
 		fargs = opexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, SubscriptingRef))
+	else if (IsA(sexpr, SubscriptingRef))
 	{
-		SubscriptingRef *sbsref = (SubscriptingRef *) expr->expr_simple_expr;
+		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
 		/* We only trust standard varlena arrays to be safe */
+		/* TODO: install some extensibility here */
 		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
 			F_ARRAY_SUBSCRIPT_HANDLER)
 			return;
@@ -8256,9 +8485,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) sbsref->refexpr;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8293,9 +8523,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) arg;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8303,6 +8534,35 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 	}
 }
 
+/*
+ * Count Params referencing the specified paramid, and return one of them
+ * if there are any.
+ *
+ * We actually only need to distinguish 0, 1, and N references; so we can
+ * abort the tree traversal as soon as we've found two.
+ */
+static bool
+count_param_references(Node *node, count_param_references_context *context)
+{
+	if (node == NULL)
+		return false;
+	else if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		if (param->paramkind == PARAM_EXTERN &&
+			param->paramid == context->paramid)
+		{
+			context->last_param = param;
+			if (++(context->count) > 1)
+				return true;	/* abort tree traversal */
+		}
+		return false;
+	}
+	else
+		return expression_tree_walker(node, count_param_references, context);
+}
+
 /*
  * exec_check_assignable --- is it OK to assign to the indicated datum?
  *
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 762af78a5e..93e47ab8ca 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -187,6 +187,17 @@ typedef enum PLpgSQL_resolve_option
 	PLPGSQL_RESOLVE_COLUMN,		/* prefer table column to plpgsql var */
 } PLpgSQL_resolve_option;
 
+/*
+ * Status of optimization of assignment to a read/write expanded object
+ */
+typedef enum PLpgSQL_rwopt
+{
+	PLPGSQL_RWOPT_UNKNOWN = 0,	/* applicability not determined yet */
+	PLPGSQL_RWOPT_NOPE,			/* cannot do any optimization */
+	PLPGSQL_RWOPT_TRANSFER,		/* transfer the old value into expr state */
+	PLPGSQL_RWOPT_INPLACE,		/* pass value as R/W to top-level function */
+} PLpgSQL_rwopt;
+
 
 /**********************************************************************
  * Node and structure definitions
@@ -246,11 +257,14 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * If we match a Param within expr_simple_expr to the variable identified
-	 * by target_param, that Param's address is stored in expr_rw_param; then
-	 * expression code generation will allow the value for that Param to be
-	 * passed as a read/write expanded-object pointer.
+	 * expr_rwopt tracks whether we have determined that assignment to a
+	 * read/write expanded object (stored in the target_param datum) can be
+	 * optimized by passing it to the expr as a read/write expanded-object
+	 * pointer.  If so, expr_rw_param identifies the specific Param that
+	 * should emit a read/write pointer; any others will emit read-only
+	 * pointers.
 	 */
+	PLpgSQL_rwopt expr_rwopt;	/* can we apply R/W optimization? */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4b9ff51594..4a346203dc 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -48,6 +48,15 @@ begin a.c1[1].i := 11; raise notice 'a = %, a.c1[1].i = %', a, a.c1[1].i; end$$;
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+
 create temp table onecol as select array[1,2] as f1;
 
 do $$ declare a int[];
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 94dc956ae8..bf78131001 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1872,6 +1872,7 @@ PLpgSQL_rec
 PLpgSQL_recfield
 PLpgSQL_resolve_option
 PLpgSQL_row
+PLpgSQL_rwopt
 PLpgSQL_stmt
 PLpgSQL_stmt_assert
 PLpgSQL_stmt_assign
@@ -3402,6 +3403,7 @@ core_yy_extra_type
 core_yyscan_t
 corrupt_items
 cost_qual_eval_context
+count_param_references_context
 cp_hash_func
 create_upper_paths_hook_type
 createdb_failure_params
-- 
2.43.5



  [text/x-diff] v3-0004-Allow-extension-functions-to-participate-in-in-pl.patch (17.1K, 5-v3-0004-Allow-extension-functions-to-participate-in-in-pl.patch)
  download | inline diff:
From 187a54ff5bcb52cab1bd209fe9d3f41884a04f6d Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Wed, 15 Jan 2025 12:47:16 -0500
Subject: [PATCH v3 4/4] Allow extension functions to participate in in-place
 updates.

Commit 1dc5ebc90 allowed PL/pgSQL to perform in-place updates
of expanded-object variables that are being updated with
assignments like "x := f(x, ...)".  However this was allowed
only for a hard-wired list of functions f(), since we need to
be sure that f() will not modify the variable if it fails.
It was always envisioned that we should make that extensible,
but at the time we didn't have a good way to do so.  Since
then we've invented the idea of "support functions" to allow
attaching specialized optimization knowledge to functions,
and that is a perfect mechanism for doing this.

Hence, adjust PL/pgSQL to use a support function request
instead of hard-wired logic to decide if in-place update
is safe.  Replace the previous behavior by creating support
functions for the three functions that were previously
hard-wired.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/backend/utils/adt/array_userfuncs.c       | 61 +++++++++++++
 src/backend/utils/adt/arraysubs.c             | 34 ++++++++
 src/include/catalog/pg_proc.dat               | 20 +++--
 src/include/nodes/supportnodes.h              | 55 +++++++++++-
 src/pl/plpgsql/src/expected/plpgsql_array.out |  3 +-
 src/pl/plpgsql/src/pl_exec.c                  | 86 ++++++++-----------
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |  1 +
 src/tools/pgindent/typedefs.list              |  1 +
 8 files changed, 202 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 0b02fe3744..2aae2f8ed9 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -16,6 +16,7 @@
 #include "common/int.h"
 #include "common/pg_prng.h"
 #include "libpq/pqformat.h"
+#include "nodes/supportnodes.h"
 #include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
@@ -167,6 +168,36 @@ array_append(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_append_support()
+ *
+ * Planner support function for array_append()
+ */
+Datum
+array_append_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place appends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) linitial(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_prepend :
  *		push an element onto the front of a one-dimensional array
@@ -230,6 +261,36 @@ array_prepend(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_prepend_support()
+ *
+ * Planner support function for array_prepend()
+ */
+Datum
+array_prepend_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place prepends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) lsecond(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_cat :
  *		concatenate two nD arrays to form an nD array, or
diff --git a/src/backend/utils/adt/arraysubs.c b/src/backend/utils/adt/arraysubs.c
index 562179b379..2940fb8e8d 100644
--- a/src/backend/utils/adt/arraysubs.c
+++ b/src/backend/utils/adt/arraysubs.c
@@ -18,6 +18,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/subscripting.h"
+#include "nodes/supportnodes.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_expr.h"
 #include "utils/array.h"
@@ -575,3 +576,36 @@ raw_array_subscript_handler(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(&sbsroutines);
 }
+
+/*
+ * array_subscript_handler_support()
+ *
+ * Planner support function for array_subscript_handler()
+ */
+Datum
+array_subscript_handler_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place subscripted assignment if the refexpr is
+		 * the array being assigned to.  We don't need to worry about array
+		 * references within the refassgnexpr or the subscripts; however, if
+		 * there's no refassgnexpr then it's a fetch which there's no need to
+		 * optimize.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *refexpr = (Param *) linitial(req->args);
+
+		if (refexpr && IsA(refexpr, Param) &&
+			refexpr->paramkind == PARAM_EXTERN &&
+			refexpr->paramid == req->paramid &&
+			lsecond(req->args) != NULL)
+			ret = (Node *) refexpr;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ba02ba53b2..9cdd81463d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1598,14 +1598,20 @@
   proname => 'cardinality', prorettype => 'int4', proargtypes => 'anyarray',
   prosrc => 'array_cardinality' },
 { oid => '378', descr => 'append element onto end of array',
-  proname => 'array_append', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_append', prosupport => 'array_append_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_append' },
+{ oid => '8680', descr => 'planner support for array_append',
+  proname => 'array_append_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_append_support' },
 { oid => '379', descr => 'prepend element onto front of array',
-  proname => 'array_prepend', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_prepend', prosupport => 'array_prepend_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatible anycompatiblearray',
   prosrc => 'array_prepend' },
+{ oid => '8681', descr => 'planner support for array_prepend',
+  proname => 'array_prepend_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_prepend_support' },
 { oid => '383',
   proname => 'array_cat', proisstrict => 'f',
   prorettype => 'anycompatiblearray',
@@ -12182,8 +12188,12 @@
 
 # subscripting support for built-in types
 { oid => '6179', descr => 'standard array subscripting support',
-  proname => 'array_subscript_handler', prorettype => 'internal',
+  proname => 'array_subscript_handler',
+  prosupport => 'array_subscript_handler_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'array_subscript_handler' },
+{ oid => '8682', descr => 'planner support for array_subscript_handler',
+  proname => 'array_subscript_handler_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_subscript_handler_support' },
 { oid => '6180', descr => 'raw array subscripting support',
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index ad5d43a2a7..9c047cc401 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -6,10 +6,10 @@
  * This file defines the API for "planner support functions", which
  * are SQL functions (normally written in C) that can be attached to
  * another "target" function to give the system additional knowledge
- * about the target function.  All the current capabilities have to do
- * with planning queries that use the target function, though it is
- * possible that future extensions will add functionality to be invoked
- * by the parser or executor.
+ * about the target function.  The name is now something of a misnomer,
+ * since some of the call sites are in the executor not the planner,
+ * but "function support function" would be a confusing name so we
+ * stick with "planner support function".
  *
  * A support function must have the SQL signature
  *		supportfn(internal) returns internal
@@ -343,4 +343,51 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+/*
+ * The ModifyInPlace request allows the support function to detect whether
+ * a call to its target function can be allowed to modify a read/write
+ * expanded object in-place.  The context is that we are considering a
+ * PL/pgSQL (or similar PL) assignment of the form "x := f(x, ...)" where
+ * the variable x is of a type that can be represented as an expanded object
+ * (see utils/expandeddatum.h).  If f() can usefully optimize by modifying
+ * the passed-in object in-place, then this request can be implemented to
+ * instruct PL/pgSQL to pass a read-write expanded pointer to the variable's
+ * value.  (Note that there is no guarantee that later calls to f() will
+ * actually do so.  If f() receives a read-only pointer, or a pointer to a
+ * non-expanded object, it must follow the usual convention of not modifying
+ * the pointed-to object.)  There are two requirements that must be met
+ * to make this safe:
+ * 1. f() must guarantee that it will not have modified the object if it
+ * fails.  Otherwise the variable's value might change unexpectedly.
+ * 2. If the other arguments to f() ("..." in the above example) contain
+ * references to x, f() must be able to cope with that; or if that's not
+ * safe, the support function must scan the other arguments to verify that
+ * there are no other references to x.  An example of the concern here is
+ * that in "arr := array_append(arr, arr[1])", if the array element type
+ * is pass-by-reference then array_append would receive a second argument
+ * that points into the array object it intends to modify.  array_append is
+ * coded to make that safe, but other functions might not be able to cope.
+ *
+ * "args" is a node tree list representing the function's arguments.
+ * One or more nodes within the node tree will be PARAM_EXTERN Params
+ * with ID "paramid", which represent the assignment target variable.
+ * (Note that such references are not necessarily at top level in the list,
+ * for example we might have "x := f(x, g(x))".  Generally it's only safe
+ * to optimize a reference that is at top level, else we're making promises
+ * about the behavior of g() as well as f().)
+ *
+ * If modify-in-place is safe, the support function should return the
+ * address of the Param node that is to return a read-write pointer.
+ * (At most one of the references is allowed to do so.)  Otherwise,
+ * return NULL.
+ */
+typedef struct SupportRequestModifyInPlace
+{
+	NodeTag		type;
+
+	Oid			funcid;			/* PG_PROC OID of the target function */
+	List	   *args;			/* Arguments to the function */
+	int			paramid;		/* ID of Param(s) representing variable */
+} SupportRequestModifyInPlace;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index e5db6d6087..4c6b3ce998 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -57,10 +57,11 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
-NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
+NOTICE:  a = {1,1,2,3,42,3,1,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 7c33c49e65..efafe2890d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -29,6 +29,7 @@
 #include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
@@ -8404,7 +8405,7 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
-	ListCell   *lc;
+	Oid			prosupport;
 
 	/* Assume unsafe */
 	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
@@ -8473,64 +8474,51 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	{
 		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
-		/* We only trust standard varlena arrays to be safe */
-		/* TODO: install some extensibility here */
-		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
-			F_ARRAY_SUBSCRIPT_HANDLER)
-			return;
-
-		/* We can optimize the refexpr if it's the target, otherwise not */
-		if (sbsref->refexpr && IsA(sbsref->refexpr, Param))
-		{
-			Param	   *param = (Param *) sbsref->refexpr;
+		funcid = get_typsubscript(sbsref->refcontainertype, NULL);
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
-
-		return;
+		/*
+		 * We assume that only the refexpr and refassgnexpr (if any) are
+		 * relevant to the support function's decision.  If that turns out to
+		 * be a bad idea, we could incorporate the subscript expressions into
+		 * the fargs list somehow.
+		 */
+		fargs = list_make2(sbsref->refexpr, sbsref->refassgnexpr);
 	}
 	else
 		return;
 
 	/*
-	 * The top-level function must be one that we trust to be "safe".
-	 * Currently we hard-wire the list, but it would be very desirable to
-	 * allow extensions to mark their functions as safe ...
+	 * The top-level function must be one that can handle in-place update
+	 * safely.  We allow functions to declare their ability to do that via a
+	 * support function request.
 	 */
-	if (!(funcid == F_ARRAY_APPEND ||
-		  funcid == F_ARRAY_PREPEND))
-		return;
-
-	/*
-	 * The target variable (in the form of a Param) must appear as a direct
-	 * argument of the top-level function.  References further down in the
-	 * tree can't be optimized; but on the other hand, they don't invalidate
-	 * optimizing the top-level call, since that will be executed last.
-	 */
-	foreach(lc, fargs)
+	prosupport = get_func_support(funcid);
+	if (OidIsValid(prosupport))
 	{
-		Node	   *arg = (Node *) lfirst(lc);
+		SupportRequestModifyInPlace req;
+		Param	   *param;
 
-		if (arg && IsA(arg, Param))
-		{
-			Param	   *param = (Param *) arg;
+		req.type = T_SupportRequestModifyInPlace;
+		req.funcid = funcid;
+		req.args = fargs;
+		req.paramid = paramid;
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
+		param = (Param *)
+			DatumGetPointer(OidFunctionCall1(prosupport,
+											 PointerGetDatum(&req)));
+
+		if (param == NULL)
+			return;				/* support function fails */
+
+		/* Verify support function followed the API */
+		Assert(IsA(param, Param));
+		Assert(param->paramkind == PARAM_EXTERN);
+		Assert(param->paramid == paramid);
+
+		/* Found the Param we want to pass as read/write */
+		expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
+		expr->expr_rw_param = param;
+		return;
 	}
 }
 
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4a346203dc..da984a9941 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -53,6 +53,7 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bf78131001..6ba5fbc3a5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2796,6 +2796,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestModifyInPlace
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.43.5



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-21 18:05  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2025-01-21 18:05 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

On Wed, Jan 15, 2025 at 10:09 AM Tom Lane <[email protected]> wrote:

> I noticed that v2 of this patch series failed to apply after
> 7b27f5fd3, so here's v3.  No non-trivial changes.
>

Thanks Tom!  These applied cleanly to my test env and actually increased my
livejournal graph benchmark by a million edges per second, so I guess I
didn't have all the previous changes in my last build as you noted.

-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-21 18:12  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-01-21 18:12 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Pavel Stehule <[email protected]>; [email protected]

Michel Pelletier <[email protected]> writes:
> Thanks Tom!  These applied cleanly to my test env and actually increased my
> livejournal graph benchmark by a million edges per second, so I guess I
> didn't have all the previous changes in my last build as you noted.

Nice!  I hope somebody will review this, because I'd really like to
get it into v18, and feature freeze is getting closer.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-26 10:07  Andrey Borodin <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Andrey Borodin @ 2025-01-26 10:07 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Hello everyone in this thread.

> On 21 Jan 2025, at 23:12, Tom Lane <[email protected]> wrote:
> 
> somebody will review this

I'm trying to dig into the patch set. My knowledge of the module is shallow and I hope to improve it by reading more patches in this area.

This patch set provides a new test, which runs just fine without the patch. But it's somewhat expected, such optimizations must be transparent for user...

And the coverage of newly invented mark_stmt() 42.37%. Some of branches are easy noops, but some are not.
I assume as a granted that we will not every get into infinite loop in a recursive call of mark_stmt().

expr_is_assignment_source() is named like if it should return nool, but it's void.

I could not grasp from reading the code one generic question about new optimization rule. What cost does checking for possible in-place update incurs to code cannot have this optimization? Is it O(numer_of_arguments) of for every assignment execution?

Thanks!


Best regards, Andrey Borodin.





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-26 15:37  Tom Lane <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-01-26 15:37 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Andrey Borodin <[email protected]> writes:
>> On 21 Jan 2025, at 23:12, Tom Lane <[email protected]> wrote:
>> somebody will review this

> I'm trying to dig into the patch set. My knowledge of the module is shallow and I hope to improve it by reading more patches in this area.

Thanks for looking!

> And the coverage of newly invented mark_stmt() 42.37%. Some of branches are easy noops, but some are not.

Yeah.  I'm not too concerned about that because it's pretty much a
copy-and-paste of the adjacent code.  Maybe we should think about
some way of refactoring pl_funcs.c to reduce duplication, but I
don't have any great ideas about how.

> expr_is_assignment_source() is named like if it should return nool, but it's void.

I've been less than satisfied with that name too.  I intended it
as a statement of fact, "this expression has been found to be 
the source of an assignment".  But it does seem confusing.
Maybe we should recast it as an action.  What do you think of
"mark_expr_as_assignment_source"?

> I could not grasp from reading the code one generic question about new optimization rule. What cost does checking for possible in-place update incurs to code cannot have this optimization? Is it O(numer_of_arguments) of for every assignment execution?

No, the extra effort is incurred at most once per assignment statement
per session.  (Unless the plpgsql function's cache entry gets
invalidated, in which case we'd rebuild all of the function's data
structures and have to redo this work too.)  We set up the evaluation
function "paramfunc" as plpgsql_param_eval_var_check if we think we
might be able to apply this optimization, or plpgsql_param_eval_var_ro
if we don't think so but the variable is of varlena type.  At runtime,
if the variable's current value is not actually expanded, then
plpgsql_param_eval_var_check falls through doing essentially the same
work as plpgsql_param_eval_var_ro, so there should be no added cost.
The first time we observe that the value *is* expanded, we incur the
cost to detect whether an optimization is really possible, and then
we change the "paramfunc" pointer to be the appropriate function
so as to apply the optimization or not without rechecking.  So
generally speaking, if we're considering a variable of a type that
doesn't support expansion, there should be zero extra per-execution
cost.  There is some extra cost at function compilation time to
determine which expressions are assignment sources (but we were doing
that already) and to discover whether those assignments are to
nonlocal variables (which is new work, but only needs to be done in
functions with exception blocks).

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-26 17:04  Andrey Borodin <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Andrey Borodin @ 2025-01-26 17:04 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]



> On 26 Jan 2025, at 20:37, Tom Lane <[email protected]> wrote:
> 
>> And the coverage of newly invented mark_stmt() 42.37%. Some of branches are easy noops, but some are not.
> 
> Yeah.  I'm not too concerned about that because it's pretty much a
> copy-and-paste of the adjacent code.  Maybe we should think about
> some way of refactoring pl_funcs.c to reduce duplication, but I
> don't have any great ideas about how.

OK, now I got it. The whole purpose of 2nd patch is to do
if (expr->target_param >= 0)	expr->target_is_local = bms_is_member(expr->target_param, local_dnos);
to local variables. 

> 
>> expr_is_assignment_source() is named like if it should return nool, but it's void.
> 
> I've been less than satisfied with that name too.  I intended it
> as a statement of fact, "this expression has been found to be 
> the source of an assignment".  But it does seem confusing.
> Maybe we should recast it as an action.  What do you think of
> "mark_expr_as_assignment_source"?

Sounds better to me. I found no examples of similar functions nether in pl_gram.y, nor in gram.y, so IMO mark_expr_as_assignment_source() is the best candidate.

> 
>> I could not grasp from reading the code one generic question about new optimization rule. What cost does checking for possible in-place update incurs to code cannot have this optimization? Is it O(numer_of_arguments) of for every assignment execution?
> 
> No, the extra effort is incurred at most once per assignment statement
> per session.  (Unless the plpgsql function's cache entry gets
> invalidated, in which case we'd rebuild all of the function's data
> structures and have to redo this work too.)

OK, I think execution benefits justify this preparatory costs.

>  We set up the evaluation
> function "paramfunc" as plpgsql_param_eval_var_check if we think we
> might be able to apply this optimization, or plpgsql_param_eval_var_ro
> if we don't think so but the variable is of varlena type.  At runtime,
> if the variable's current value is not actually expanded, then
> plpgsql_param_eval_var_check falls through doing essentially the same
> work as plpgsql_param_eval_var_ro, so there should be no added cost.
> The first time we observe that the value *is* expanded, we incur the
> cost to detect whether an optimization is really possible, and then
> we change the "paramfunc" pointer to be the appropriate function
> so as to apply the optimization or not without rechecking.  So
> generally speaking, if we're considering a variable of a type that
> doesn't support expansion, there should be zero extra per-execution
> cost.  There is some extra cost at function compilation time to
> determine which expressions are assignment sources (but we were doing
> that already) and to discover whether those assignments are to
> nonlocal variables (which is new work, but only needs to be done in
> functions with exception blocks).

Got it, many thanks for the explanation.

But I've got some new questions:

I'm lost in internals of ExprEvalStep. But void *paramarg and his friend void *paramarg2 are cryptic. They always have same type and same meaning, but have very generic names.

I wonder if you plan similar optimizations for array_cat(), array_remove() etc?

+   a := a || a; -- not optimizable

Why is it not optimizable? Because there is no support function, because array_cat() has no support function, or something else?

Besides this, the patch looks good to me.


Best regards, Andrey Borodin.





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-26 19:04  Tom Lane <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-01-26 19:04 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Andrey Borodin <[email protected]> writes:
> On 26 Jan 2025, at 20:37, Tom Lane <[email protected]> wrote:
>> Maybe we should recast it as an action.  What do you think of
>> "mark_expr_as_assignment_source"?

> Sounds better to me. I found no examples of similar functions nether in pl_gram.y, nor in gram.y, so IMO mark_expr_as_assignment_source() is the best candidate.

WFM, I'll make it so in next version.

> Got it, many thanks for the explanation.

I'll see about incorporating more of that in the comments, too.

> I wonder if you plan similar optimizations for array_cat(), array_remove() etc?
> +   a := a || a; -- not optimizable
> Why is it not optimizable? Because there is no support function, because array_cat() has no support function, or something else?

plpgsql won't attempt to optimize it because "a" is referenced twice
and there is no support function that might say it's safe anyway.

array_cat doesn't currently have any special smarts about expanded
arrays, so it's all moot because the arrays would get flattened
on the way into it.  If we did improve it to be able to cope with
expanded arrays, I'm not real sure that it could safely manage an
in-place update where the two inputs are the same array --- at
the least, some extreme care would be needed to get the right
answers.

I'm not real excited about optimizing additional array operations
anyway.  Maybe some more will get done at some point, but I don't
see that as part of this work.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-30 20:32  Pavel Borisov <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Pavel Borisov @ 2025-01-30 20:32 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Hi, Michel and Tom!

On Sun, 26 Jan 2025 at 23:04, Tom Lane <[email protected]> wrote:
>
> Andrey Borodin <[email protected]> writes:
> > On 26 Jan 2025, at 20:37, Tom Lane <[email protected]> wrote:
> >> Maybe we should recast it as an action.  What do you think of
> >> "mark_expr_as_assignment_source"?
>
> > Sounds better to me. I found no examples of similar functions nether in pl_gram.y, nor in gram.y, so IMO mark_expr_as_assignment_source() is the best candidate.
>
> WFM, I'll make it so in next version.
>
> > Got it, many thanks for the explanation.
>
> I'll see about incorporating more of that in the comments, too.
>
> > I wonder if you plan similar optimizations for array_cat(), array_remove() etc?
> > +   a := a || a; -- not optimizable
> > Why is it not optimizable? Because there is no support function, because array_cat() has no support function, or something else?
>
> plpgsql won't attempt to optimize it because "a" is referenced twice
> and there is no support function that might say it's safe anyway.
>
> array_cat doesn't currently have any special smarts about expanded
> arrays, so it's all moot because the arrays would get flattened
> on the way into it.  If we did improve it to be able to cope with
> expanded arrays, I'm not real sure that it could safely manage an
> in-place update where the two inputs are the same array --- at
> the least, some extreme care would be needed to get the right
> answers.
>
> I'm not real excited about optimizing additional array operations
> anyway.  Maybe some more will get done at some point, but I don't
> see that as part of this work.
>
>                         regards, tom lane

I started looking at the patchset.
Recently it got conflicts with changes to yyparse (473a575e05979b4db).
Could you rebase it and also do naming changes proposed by Andrew
Borodin, which I definitely agree with?

Regards,
Pavel Borisov






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-01-31 01:53  Tom Lane <[email protected]>
  parent: Pavel Borisov <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-01-31 01:53 UTC (permalink / raw)
  To: Pavel Borisov <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Pavel Borisov <[email protected]> writes:
> I started looking at the patchset.
> Recently it got conflicts with changes to yyparse (473a575e05979b4db).
> Could you rebase it and also do naming changes proposed by Andrew
> Borodin, which I definitely agree with?

Hmm, it seemed to still apply for me.  But anyway, I needed to make
the other changes, so here's v4.

			regards, tom lane



Attachments:

  [text/x-diff] v4-0001-Preliminary-refactoring.patch (9.9K, 2-v4-0001-Preliminary-refactoring.patch)
  download | inline diff:
From 3b40c1acdfb35df6ae6b95e96c9c0fac479eb2c6 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Thu, 30 Jan 2025 19:53:23 -0500
Subject: [PATCH v4 1/4] Preliminary refactoring.

This short and boring patch simply moves the responsibility for
initializing PLpgSQL_expr.target_param into plpgsql parsing,
rather than doing it at first execution of the expr as before.
This doesn't save anything in terms of runtime, since the work was
trivial and done only once per expr anyway.  But it makes the info
available during parsing, which will be useful for the next step.

Likewise set PLpgSQL_expr.func during parsing.  According to the
comments, this was once impossible; but it's certainly possible
since we invented the plpgsql_curr_compile variable.  Again, this
saves little runtime, but it seems far cleaner conceptually.

While at it, I reordered stuff in struct PLpgSQL_expr to make it
clearer which fields are filled when, and merged some duplicative
code in pl_gram.y.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_exec.c | 27 ---------------
 src/pl/plpgsql/src/pl_gram.y | 65 ++++++++++++++++++++++++------------
 src/pl/plpgsql/src/plpgsql.h | 31 +++++++++--------
 3 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 35cda55cf9..fec1811ae1 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4174,12 +4174,6 @@ exec_prepare_plan(PLpgSQL_execstate *estate,
 	SPIPlanPtr	plan;
 	SPIPrepareOptions options;
 
-	/*
-	 * The grammar can't conveniently set expr->func while building the parse
-	 * tree, so make sure it's set before parser hooks need it.
-	 */
-	expr->func = estate->func;
-
 	/*
 	 * Generate and save the plan
 	 */
@@ -5016,21 +5010,7 @@ exec_assign_expr(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 	 * If first time through, create a plan for this expression.
 	 */
 	if (expr->plan == NULL)
-	{
-		/*
-		 * Mark the expression as being an assignment source, if target is a
-		 * simple variable.  (This is a bit messy, but it seems cleaner than
-		 * modifying the API of exec_prepare_plan for the purpose.  We need to
-		 * stash the target dno into the expr anyway, so that it will be
-		 * available if we have to replan.)
-		 */
-		if (target->dtype == PLPGSQL_DTYPE_VAR)
-			expr->target_param = target->dno;
-		else
-			expr->target_param = -1;	/* should be that already */
-
 		exec_prepare_plan(estate, expr, 0);
-	}
 
 	value = exec_eval_expr(estate, expr, &isnull, &valtype, &valtypmod);
 	exec_assign_value(estate, target, value, isnull, valtype, valtypmod);
@@ -6282,13 +6262,6 @@ setup_param_list(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 		 * that they are interrupting an active use of parameters.
 		 */
 		paramLI->parserSetupArg = expr;
-
-		/*
-		 * Also make sure this is set before parser hooks need it.  There is
-		 * no need to save and restore, since the value is always correct once
-		 * set.  (Should be set already, but let's be sure.)
-		 */
-		expr->func = estate->func;
 	}
 	else
 	{
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 64d2c362bf..f55aefb100 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -61,6 +61,10 @@ static	bool			tok_is_keyword(int token, union YYSTYPE *lval,
 static	void			word_is_not_variable(PLword *word, int location, yyscan_t yyscanner);
 static	void			cword_is_not_variable(PLcword *cword, int location, yyscan_t yyscanner);
 static	void			current_token_is_not_variable(int tok, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yyscan_t yyscanner);
+static	PLpgSQL_expr	*make_plpgsql_expr(const char *query,
+										   RawParseMode parsemode);
+static	void			mark_expr_as_assignment_source(PLpgSQL_expr *expr,
+													   PLpgSQL_datum *target);
 static	PLpgSQL_expr	*read_sql_construct(int until,
 											int until2,
 											int until3,
@@ -536,6 +540,10 @@ decl_statement	: decl_varname decl_const decl_datatype decl_collate decl_notnull
 									 errmsg("variable \"%s\" must have a default value, since it's declared NOT NULL",
 											var->refname),
 									 parser_errposition(@5)));
+
+						if (var->default_val != NULL)
+							mark_expr_as_assignment_source(var->default_val,
+														   (PLpgSQL_datum *) var);
 					}
 				| decl_varname K_ALIAS K_FOR decl_aliasitem ';'
 					{
@@ -996,6 +1004,7 @@ stmt_assign		: T_DATUM
 													   false, true,
 													   NULL, NULL,
 													   &yylval, &yylloc, yyscanner);
+						mark_expr_as_assignment_source(new->expr, $1.datum);
 
 						$$ = (PLpgSQL_stmt *) new;
 					}
@@ -2651,6 +2660,38 @@ current_token_is_not_variable(int tok, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yysca
 		yyerror(yyllocp, NULL, yyscanner, "syntax error");
 }
 
+/* Convenience routine to construct a PLpgSQL_expr struct */
+static PLpgSQL_expr *
+make_plpgsql_expr(const char *query,
+				  RawParseMode parsemode)
+{
+	PLpgSQL_expr *expr = palloc0(sizeof(PLpgSQL_expr));
+
+	expr->query = pstrdup(query);
+	expr->parseMode = parsemode;
+	expr->func = plpgsql_curr_compile;
+	expr->ns = plpgsql_ns_top();
+	/* might get changed later during parsing: */
+	expr->target_param = -1;
+	/* other fields are left as zeroes until first execution */
+	return expr;
+}
+
+/* Mark a PLpgSQL_expr as being the source of an assignment to target */
+static void
+mark_expr_as_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
+{
+	/*
+	 * Mark the expression as being an assignment source, if target is a
+	 * simple variable.  We don't currently support optimized assignments to
+	 * other DTYPEs, so no need to mark in other cases.
+	 */
+	if (target->dtype == PLPGSQL_DTYPE_VAR)
+		expr->target_param = target->dno;
+	else
+		expr->target_param = -1;	/* should be that already */
+}
+
 /* Convenience routine to read an expression with one possible terminator */
 static PLpgSQL_expr *
 read_sql_expression(int until, const char *expected, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yyscan_t yyscanner)
@@ -2794,13 +2835,7 @@ read_sql_construct(int until,
 	 */
 	plpgsql_append_source_text(&ds, startlocation, endlocation, yyscanner);
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = parsemode;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, parsemode);
 	pfree(ds.data);
 
 	if (valid_sql)
@@ -3122,13 +3157,7 @@ make_execsql_stmt(int firsttoken, int location, PLword *word, YYSTYPE *yylvalp,
 	while (ds.len > 0 && scanner_isspace(ds.data[ds.len - 1]))
 		ds.data[--ds.len] = '\0';
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_DEFAULT;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_DEFAULT);
 	pfree(ds.data);
 
 	check_sql_expr(expr->query, expr->parseMode, location, yyscanner);
@@ -4006,13 +4035,7 @@ read_cursor_args(PLpgSQL_var *cursor, int until, YYSTYPE *yylvalp, YYLTYPE *yyll
 			appendStringInfoString(&ds, ", ");
 	}
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_PLPGSQL_EXPR;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_PLPGSQL_EXPR);
 	pfree(ds.data);
 
 	/* Next we'd better find the until token */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 441df5354e..b0052167ee 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -219,14 +219,22 @@ typedef struct PLpgSQL_expr
 {
 	char	   *query;			/* query string, verbatim from function body */
 	RawParseMode parseMode;		/* raw_parser() mode to use */
-	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
-	Bitmapset  *paramnos;		/* all dnos referenced by this query */
+	struct PLpgSQL_function *func;	/* function containing this expr */
+	struct PLpgSQL_nsitem *ns;	/* namespace chain visible to this expr */
 
-	/* function containing this expr (not set until we first parse query) */
-	struct PLpgSQL_function *func;
+	/*
+	 * These fields are used to help optimize assignments to expanded-datum
+	 * variables.  If this expression is the source of an assignment to a
+	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 */
+	int			target_param;	/* dno of assign target, or -1 if none */
 
-	/* namespace chain visible to this expr */
-	struct PLpgSQL_nsitem *ns;
+	/*
+	 * Fields above are set during plpgsql parsing.  Remaining fields are left
+	 * as zeroes/NULLs until we first parse/plan the query.
+	 */
+	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
+	Bitmapset  *paramnos;		/* all dnos referenced by this query */
 
 	/* fields for "simple expression" fast-path execution: */
 	Expr	   *expr_simple_expr;	/* NULL means not a simple expr */
@@ -235,14 +243,11 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * These fields are used to optimize assignments to expanded-datum
-	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno; else it's -1.
-	 * If we match a Param within expr_simple_expr to such a variable, that
-	 * Param's address is stored in expr_rw_param; then expression code
-	 * generation will allow the value for that Param to be passed read/write.
+	 * If we match a Param within expr_simple_expr to the variable identified
+	 * by target_param, that Param's address is stored in expr_rw_param; then
+	 * expression code generation will allow the value for that Param to be
+	 * passed as a read/write expanded-object pointer.
 	 */
-	int			target_param;	/* dno of assign target, or -1 if none */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
-- 
2.43.5



  [text/x-diff] v4-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch (19.5K, 3-v4-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch)
  download | inline diff:
From ee9b359bf16279569f0fdc378f42030eea89ec0b Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Thu, 30 Jan 2025 20:07:08 -0500
Subject: [PATCH v4 2/4] Detect whether plpgsql assignment targets are "local"
 variables.

Mark whether the target of a potentially optimizable assignment
is "local", in the sense of being declared inside any exception
block that could trap an error thrown from the assignment.
(This implies that we needn't preserve the variable's value
in case of an error.  This patch doesn't do anything with the
knowledge, but the next one will.)

Normally, this requires a post-parsing scan of the function's
parse tree, since we don't know while parsing a BEGIN ...
construct whether we will find EXCEPTION at its end.  However,
if there are no BEGIN ... EXCEPTION blocks in the function at
all, then all assignments are local, even those to variables
representing function arguments.  We optimize that common case
by initializing the target_is_local flags to "true", and fixing
them up with a post-scan only if we found EXCEPTION.

The scan is implemented by code that's largely copied-and-pasted
from the nearby code to scan a plpgsql parse tree for deletion.
It's a bit annoying to have three copies of that now, but I'm
not seeing a way to refactor it that would save much code on net.

Note that variables' default-value expressions are never interesting
for expanded-variable optimization, since they couldn't contain a
reference to the target variable anyway.  But the code is set up
to compute their target_param and target_is_local correctly anyway,
for consistency and in case someone thinks of a use for that data.

I added a bit of plpgsql_dumptree support to help verify that
this code sets the flags as expected.  I'm not set on keeping
that, but I do want to keep the addition of a plpgsql_dumptree
call in plpgsql_compile_inline.  It's at best an oversight that
"#option dump" doesn't work in a DO block.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_comp.c  |  12 +
 src/pl/plpgsql/src/pl_funcs.c | 399 ++++++++++++++++++++++++++++++++++
 src/pl/plpgsql/src/pl_gram.y  |  15 ++
 src/pl/plpgsql/src/plpgsql.h  |   7 +-
 4 files changed, 432 insertions(+), 1 deletion(-)

diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index a2de0880fb..f36a244140 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -371,6 +371,7 @@ do_compile(FunctionCallInfo fcinfo,
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	/*
 	 * Initialize the compiler, particularly the namespace stack.  The
@@ -811,6 +812,9 @@ do_compile(FunctionCallInfo fcinfo,
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
 	/* Debug dump for completed functions */
 	if (plpgsql_DumpExecTree)
 		plpgsql_dumptree(function);
@@ -906,6 +910,7 @@ plpgsql_compile_inline(char *proc_source)
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	plpgsql_ns_init();
 	plpgsql_ns_push(func_name, PLPGSQL_LABEL_BLOCK);
@@ -962,6 +967,13 @@ plpgsql_compile_inline(char *proc_source)
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
+	/* Debug dump for completed functions */
+	if (plpgsql_DumpExecTree)
+		plpgsql_dumptree(function);
+
 	/*
 	 * Pop the error context stack
 	 */
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index 8c827fe5cc..d57935b8c1 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -333,6 +333,402 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 }
 
 
+/**********************************************************************
+ * Mark assignment source expressions that have local target variables,
+ * that is, the target variable is declared within the exception block
+ * most closely containing the assignment itself.  (Such target variables
+ * need not be preserved if the assignment's source expression raises an
+ * error, since the variable will no longer be accessible afterwards.
+ * Detecting this allows better optimization.)
+ *
+ * This code need not be called if the plpgsql function contains no exception
+ * blocks, because mark_expr_as_assignment_source will have set all the flags
+ * to true already.  Also, we need not examine default-value expressions for
+ * variables, because variable declarations are necessarily within the nearest
+ * exception block.  (In DECLARE ... BEGIN ... EXCEPTION ... END, the variable
+ * initializations are done before entering the exception scope.)  So it's
+ * sufficient to find assignment statements.
+ *
+ * Within the recursion, local_dnos is a Bitmapset of dnos of variables
+ * known to be declared within the current exception level.
+ **********************************************************************/
+static void mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos);
+static void mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos);
+static void mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos);
+static void mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos);
+static void mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos);
+static void mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos);
+static void mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos);
+static void mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos);
+static void mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos);
+static void mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos);
+static void mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos);
+static void mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos);
+static void mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos);
+static void mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos);
+static void mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos);
+static void mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos);
+static void mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos);
+static void mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos);
+static void mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos);
+static void mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos);
+static void mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos);
+static void mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos);
+static void mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos);
+static void mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos);
+static void mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos);
+static void mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos);
+static void mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos);
+static void mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos);
+
+
+static void
+mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos)
+{
+	switch (stmt->cmd_type)
+	{
+		case PLPGSQL_STMT_BLOCK:
+			mark_block((PLpgSQL_stmt_block *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSIGN:
+			mark_assign((PLpgSQL_stmt_assign *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_IF:
+			mark_if((PLpgSQL_stmt_if *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CASE:
+			mark_case((PLpgSQL_stmt_case *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_LOOP:
+			mark_loop((PLpgSQL_stmt_loop *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_WHILE:
+			mark_while((PLpgSQL_stmt_while *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORI:
+			mark_fori((PLpgSQL_stmt_fori *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORS:
+			mark_fors((PLpgSQL_stmt_fors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORC:
+			mark_forc((PLpgSQL_stmt_forc *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FOREACH_A:
+			mark_foreach_a((PLpgSQL_stmt_foreach_a *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXIT:
+			mark_exit((PLpgSQL_stmt_exit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN:
+			mark_return((PLpgSQL_stmt_return *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_NEXT:
+			mark_return_next((PLpgSQL_stmt_return_next *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_QUERY:
+			mark_return_query((PLpgSQL_stmt_return_query *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RAISE:
+			mark_raise((PLpgSQL_stmt_raise *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSERT:
+			mark_assert((PLpgSQL_stmt_assert *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXECSQL:
+			mark_execsql((PLpgSQL_stmt_execsql *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNEXECUTE:
+			mark_dynexecute((PLpgSQL_stmt_dynexecute *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNFORS:
+			mark_dynfors((PLpgSQL_stmt_dynfors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_GETDIAG:
+			mark_getdiag((PLpgSQL_stmt_getdiag *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_OPEN:
+			mark_open((PLpgSQL_stmt_open *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FETCH:
+			mark_fetch((PLpgSQL_stmt_fetch *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CLOSE:
+			mark_close((PLpgSQL_stmt_close *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_PERFORM:
+			mark_perform((PLpgSQL_stmt_perform *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CALL:
+			mark_call((PLpgSQL_stmt_call *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_COMMIT:
+			mark_commit((PLpgSQL_stmt_commit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ROLLBACK:
+			mark_rollback((PLpgSQL_stmt_rollback *) stmt, local_dnos);
+			break;
+		default:
+			elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type);
+			break;
+	}
+}
+
+static void
+mark_stmts(List *stmts, Bitmapset *local_dnos)
+{
+	ListCell   *s;
+
+	foreach(s, stmts)
+	{
+		mark_stmt((PLpgSQL_stmt *) lfirst(s), local_dnos);
+	}
+}
+
+static void
+mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos)
+{
+	if (block->exceptions)
+	{
+		ListCell   *e;
+
+		/*
+		 * The block creates a new exception scope, so variables declared at
+		 * outer levels are nonlocal.  For that matter, so are any variables
+		 * declared in the block's DECLARE section.  Hence, we must pass down
+		 * empty local_dnos.
+		 */
+		mark_stmts(block->body, NULL);
+
+		foreach(e, block->exceptions->exc_list)
+		{
+			PLpgSQL_exception *exc = (PLpgSQL_exception *) lfirst(e);
+
+			mark_stmts(exc->action, NULL);
+		}
+	}
+	else
+	{
+		/*
+		 * Otherwise, the block does not create a new exception scope, and any
+		 * variables it declares can also be considered local within it.  Note
+		 * that only initializable datum types (VAR, REC) are included in
+		 * initvarnos; but that's sufficient for our purposes.
+		 */
+		local_dnos = bms_copy(local_dnos);
+		for (int i = 0; i < block->n_initvars; i++)
+			local_dnos = bms_add_member(local_dnos, block->initvarnos[i]);
+		mark_stmts(block->body, local_dnos);
+		bms_free(local_dnos);
+	}
+}
+
+static void
+mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos)
+{
+	PLpgSQL_expr *expr = stmt->expr;
+
+	/*
+	 * If this expression has an assignment target, check whether the target
+	 * is local, and mark the expression accordingly.
+	 */
+	if (expr->target_param >= 0)
+		expr->target_is_local = bms_is_member(expr->target_param, local_dnos);
+}
+
+static void
+mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->then_body, local_dnos);
+	foreach(l, stmt->elsif_list)
+	{
+		PLpgSQL_if_elsif *elif = (PLpgSQL_if_elsif *) lfirst(l);
+
+		/* elif->cond cannot be an assignment source */
+		mark_stmts(elif->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_body, local_dnos);
+}
+
+static void
+mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->t_expr cannot be an assignment source */
+	foreach(l, stmt->case_when_list)
+	{
+		PLpgSQL_case_when *cwt = (PLpgSQL_case_when *) lfirst(l);
+
+		/* cwt->expr cannot be an assignment source */
+		mark_stmts(cwt->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_stmts, local_dnos);
+}
+
+static void
+mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->lower, upper, step cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+}
+
+static void
+mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->argquery cannot be an assignment source */
+}
+
+static void
+mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->argquery, query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+}
+
+static void
+mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->params cannot contain an assignment source */
+	/* stmt->options cannot contain an assignment source */
+}
+
+static void
+mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond, message cannot be an assignment source */
+}
+
+static void
+mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->sqlstmt cannot be an assignment source */
+}
+
+static void
+mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos)
+{
+}
+
+void
+plpgsql_mark_local_assignment_targets(PLpgSQL_function *func)
+{
+	Bitmapset  *local_dnos;
+
+	/* Function parameters can be treated as local targets at outer level */
+	local_dnos = NULL;
+	for (int i = 0; i < func->fn_nargs; i++)
+		local_dnos = bms_add_member(local_dnos, func->fn_argvarnos[i]);
+	if (func->action)
+		mark_block(func->action, local_dnos);
+	bms_free(local_dnos);
+}
+
+
 /**********************************************************************
  * Release memory when a PL/pgSQL function is no longer needed
  *
@@ -1594,6 +1990,9 @@ static void
 dump_expr(PLpgSQL_expr *expr)
 {
 	printf("'%s'", expr->query);
+	if (expr->target_param >= 0)
+		printf(" target %d%s", expr->target_param,
+			   expr->target_is_local ? " (local)" : "");
 }
 
 void
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index f55aefb100..8048e040f8 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -2328,6 +2328,8 @@ exception_sect	:
 						PLpgSQL_exception_block *new = palloc(sizeof(PLpgSQL_exception_block));
 						PLpgSQL_variable *var;
 
+						plpgsql_curr_compile->has_exception_block = true;
+
 						var = plpgsql_build_variable("sqlstate", lineno,
 													 plpgsql_build_datatype(TEXTOID,
 																			-1,
@@ -2673,6 +2675,7 @@ make_plpgsql_expr(const char *query,
 	expr->ns = plpgsql_ns_top();
 	/* might get changed later during parsing: */
 	expr->target_param = -1;
+	expr->target_is_local = false;
 	/* other fields are left as zeroes until first execution */
 	return expr;
 }
@@ -2687,9 +2690,21 @@ mark_expr_as_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
 	 * other DTYPEs, so no need to mark in other cases.
 	 */
 	if (target->dtype == PLPGSQL_DTYPE_VAR)
+	{
 		expr->target_param = target->dno;
+
+		/*
+		 * For now, assume the target is local to the nearest enclosing
+		 * exception block.  That's correct if the function contains no
+		 * exception blocks; otherwise we'll update this later.
+		 */
+		expr->target_is_local = true;
+	}
 	else
+	{
 		expr->target_param = -1;	/* should be that already */
+		expr->target_is_local = false; /* ditto */
+	}
 }
 
 /* Convenience routine to read an expression with one possible terminator */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index b0052167ee..2fa6d73cab 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -225,9 +225,12 @@ typedef struct PLpgSQL_expr
 	/*
 	 * These fields are used to help optimize assignments to expanded-datum
 	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 * simple variable, target_param holds that variable's dno (else it's -1),
+	 * and target_is_local indicates whether the target is declared inside the
+	 * closest exception block containing the assignment.
 	 */
 	int			target_param;	/* dno of assign target, or -1 if none */
+	bool		target_is_local;	/* is it within nearest exception block? */
 
 	/*
 	 * Fields above are set during plpgsql parsing.  Remaining fields are left
@@ -1014,6 +1017,7 @@ typedef struct PLpgSQL_function
 	/* data derived while parsing body */
 	unsigned int nstatements;	/* counter for assigning stmtids */
 	bool		requires_procedure_resowner;	/* contains CALL or DO? */
+	bool		has_exception_block;	/* contains BEGIN...EXCEPTION? */
 
 	/* these fields change when the function is used */
 	struct PLpgSQL_execstate *cur_estate;
@@ -1312,6 +1316,7 @@ extern PLpgSQL_nsitem *plpgsql_ns_find_nearest_loop(PLpgSQL_nsitem *ns_cur);
  */
 extern PGDLLEXPORT const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
 extern const char *plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind);
+extern void plpgsql_mark_local_assignment_targets(PLpgSQL_function *func);
 extern void plpgsql_free_function_memory(PLpgSQL_function *func);
 extern void plpgsql_dumptree(PLpgSQL_function *func);
 
-- 
2.43.5



  [text/x-diff] v4-0003-Implement-new-optimization-rule-for-updates-of-ex.patch (26.8K, 4-v4-0003-Implement-new-optimization-rule-for-updates-of-ex.patch)
  download | inline diff:
From 079105c97093fc759bd6e0edc9e5e03edd33ad90 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Thu, 30 Jan 2025 20:38:36 -0500
Subject: [PATCH v4 3/4] Implement new optimization rule for updates of
 expanded variables.

If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/include/executor/execExpr.h               |   1 +
 src/pl/plpgsql/src/expected/plpgsql_array.out |   9 +
 src/pl/plpgsql/src/pl_exec.c                  | 383 +++++++++++++++---
 src/pl/plpgsql/src/plpgsql.h                  |  22 +-
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |   9 +
 src/tools/pgindent/typedefs.list              |   2 +
 6 files changed, 364 insertions(+), 62 deletions(-)

diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 51bd35dcb0..191d8fe34d 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -425,6 +425,7 @@ typedef struct ExprEvalStep
 		{
 			ExecEvalSubroutine paramfunc;	/* add-on evaluation subroutine */
 			void	   *paramarg;	/* private data for same */
+			void	   *paramarg2;	/* more private data for same */
 			int			paramid;	/* numeric ID for parameter */
 			Oid			paramtype;	/* OID of parameter's datatype */
 		}			cparam;
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index ad60e0e8be..e5db6d6087 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -52,6 +52,15 @@ NOTICE:  a = ("{""(,11)""}",), a.c1[1].i = 11
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 NOTICE:  a = {1,2,3}
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index fec1811ae1..28b6c85d8d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -251,6 +251,15 @@ static HTAB *shared_cast_hash = NULL;
 	else \
 		Assert(rc == PLPGSQL_RC_OK)
 
+/* State struct for count_param_references */
+typedef struct count_param_references_context
+{
+	int			paramid;
+	int			count;
+	Param	   *last_param;
+} count_param_references_context;
+
+
 /************************************************************
  * Local function forward declarations
  ************************************************************/
@@ -336,7 +345,9 @@ static void exec_prepare_plan(PLpgSQL_execstate *estate,
 static void exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr);
 static bool exec_is_simple_query(PLpgSQL_expr *expr);
 static void exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan);
-static void exec_check_rw_parameter(PLpgSQL_expr *expr);
+static void exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid);
+static bool count_param_references(Node *node,
+								   count_param_references_context *context);
 static void exec_check_assignable(PLpgSQL_execstate *estate, int dno);
 static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
 								  PLpgSQL_expr *expr,
@@ -384,6 +395,10 @@ static ParamExternData *plpgsql_param_fetch(ParamListInfo params,
 static void plpgsql_param_compile(ParamListInfo params, Param *param,
 								  ExprState *state,
 								  Datum *resv, bool *resnull);
+static void plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+										 ExprContext *econtext);
+static void plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+											ExprContext *econtext);
 static void plpgsql_param_eval_var(ExprState *state, ExprEvalStep *op,
 								   ExprContext *econtext);
 static void plpgsql_param_eval_var_ro(ExprState *state, ExprEvalStep *op,
@@ -6078,10 +6093,13 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 
 		/*
 		 * Reset to "not simple" to leave sane state (with no dangling
-		 * pointers) in case we fail while replanning.  expr_simple_plansource
-		 * can be left alone however, as that cannot move.
+		 * pointers) in case we fail while replanning.  We'll need to
+		 * re-determine simplicity and R/W optimizability anyway, since those
+		 * could change with the new plan.  expr_simple_plansource can be left
+		 * alone however, as that cannot move.
 		 */
 		expr->expr_simple_expr = NULL;
+		expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 		expr->expr_rw_param = NULL;
 		expr->expr_simple_plan = NULL;
 		expr->expr_simple_plan_lxid = InvalidLocalTransactionId;
@@ -6439,16 +6457,27 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	scratch.resnull = resnull;
 
 	/*
-	 * Select appropriate eval function.  It seems worth special-casing
-	 * DTYPE_VAR and DTYPE_RECFIELD for performance.  Also, we can determine
-	 * in advance whether MakeExpandedObjectReadOnly() will be required.
-	 * Currently, only VAR/PROMISE and REC datums could contain read/write
-	 * expanded objects.
+	 * Select appropriate eval function.
+	 *
+	 * First, if this Param references the same varlena-type DTYPE_VAR datum
+	 * that is the target of the assignment containing this simple expression,
+	 * then it's possible we will be able to optimize handling of R/W expanded
+	 * datums.  We don't want to do the work needed to determine that unless
+	 * we actually see a R/W expanded datum at runtime, so install a checking
+	 * function that will figure that out when needed.
+	 *
+	 * Otherwise, it seems worth special-casing DTYPE_VAR and DTYPE_RECFIELD
+	 * for performance.  Also, we can determine in advance whether
+	 * MakeExpandedObjectReadOnly() will be required.  Currently, only
+	 * VAR/PROMISE and REC datums could contain read/write expanded objects.
 	 */
 	if (datum->dtype == PLPGSQL_DTYPE_VAR)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		bool		isvarlena = (((PLpgSQL_var *) datum)->datatype->typlen == -1);
+
+		if (isvarlena && dno == expr->target_param && expr->expr_simple_expr)
+			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_check;
+		else if (isvarlena)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var;
@@ -6457,14 +6486,12 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_recfield;
 	else if (datum->dtype == PLPGSQL_DTYPE_PROMISE)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		if (((PLpgSQL_var *) datum)->datatype->typlen == -1)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
 	}
-	else if (datum->dtype == PLPGSQL_DTYPE_REC &&
-			 param != expr->expr_rw_param)
+	else if (datum->dtype == PLPGSQL_DTYPE_REC)
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 	else
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
@@ -6473,14 +6500,177 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	 * Note: it's tempting to use paramarg to store the estate pointer and
 	 * thereby save an indirection or two in the eval functions.  But that
 	 * doesn't work because the compiled expression might be used with
-	 * different estates for the same PL/pgSQL function.
+	 * different estates for the same PL/pgSQL function.  Instead, store
+	 * pointers to the PLpgSQL_expr as well as this specific Param, to support
+	 * plpgsql_param_eval_var_check().
 	 */
-	scratch.d.cparam.paramarg = NULL;
+	scratch.d.cparam.paramarg = expr;
+	scratch.d.cparam.paramarg2 = param;
 	scratch.d.cparam.paramid = param->paramid;
 	scratch.d.cparam.paramtype = param->paramtype;
 	ExprEvalPushStep(state, &scratch);
 }
 
+/*
+ * plpgsql_param_eval_var_check		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we may need to determine the applicability of a read/write optimization,
+ * but we've not done that yet.  The work to determine applicability will
+ * be done at most once (per construction of the PL/pgSQL function's cache
+ * entry) when we first see that the target variable's old value is a R/W
+ * expanded object.  If we never do see that, nothing is lost: the amount
+ * of work done by this function in that case is just about the same as
+ * what would be done by plpgsql_param_eval_var_ro, which is what we'd
+ * have used otherwise.
+ */
+static void
+plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+							 ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, it's time to
+	 * decide whether/how to optimize the assignment.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		PLpgSQL_expr *expr = (PLpgSQL_expr *) op->d.cparam.paramarg;
+		Param	   *param = (Param *) op->d.cparam.paramarg2;
+
+		/*
+		 * We might have already figured this out while evaluating some other
+		 * Param referencing the same variable, so check expr_rwopt first.
+		 */
+		if (expr->expr_rwopt == PLPGSQL_RWOPT_UNKNOWN)
+			exec_check_rw_parameter(expr, op->d.cparam.paramid);
+
+		/*
+		 * Update the callback pointer to match what we decided to do, so that
+		 * this function will not be called again.  Then pass off this
+		 * execution to the newly-selected function.
+		 */
+		switch (expr->expr_rwopt)
+		{
+			case PLPGSQL_RWOPT_UNKNOWN:
+				Assert(false);
+				break;
+			case PLPGSQL_RWOPT_NOPE:
+				/* Force the value to read-only in all future executions */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+				plpgsql_param_eval_var_ro(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_TRANSFER:
+				/* There can be only one matching Param in this case */
+				Assert(param == expr->expr_rw_param);
+				/* When the value is read/write, transfer to exec context */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_transfer;
+				plpgsql_param_eval_var_transfer(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_INPLACE:
+				if (param == expr->expr_rw_param)
+				{
+					/* When the value is read/write, deliver it as-is */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var;
+					plpgsql_param_eval_var(state, op, econtext);
+				}
+				else
+				{
+					/* Not the optimizable reference, so force to read-only */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+					plpgsql_param_eval_var_ro(state, op, econtext);
+				}
+				break;
+		}
+		return;
+	}
+
+	/*
+	 * Otherwise, continue to postpone that decision, and execute an inlined
+	 * version of exec_eval_datum().  Although this value could potentially
+	 * need MakeExpandedObjectReadOnly, we know it doesn't right now.
+	 */
+	*op->resvalue = var->value;
+	*op->resnull = var->isnull;
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
+/*
+ * plpgsql_param_eval_var_transfer		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we have determined that a read/write expanded value can be handed off
+ * into execution of the expression (and then possibly returned to our
+ * function's ownership afterwards).  We have to test though, because the
+ * variable might not contain a read/write expanded value during this
+ * execution.
+ */
+static void
+plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+								ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, transfer its
+	 * ownership into the expression execution context, then drop our own
+	 * reference to the value by setting the variable to NULL.  That'll be
+	 * overwritten (perhaps with this same object) when control comes back
+	 * from the expression.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		*op->resvalue = TransferExpandedObject(var->value,
+											   get_eval_mcontext(estate));
+		*op->resnull = false;
+
+		var->value = (Datum) 0;
+		var->isnull = true;
+		var->freeval = false;
+	}
+	else
+	{
+		/*
+		 * Otherwise we can pass the variable's value directly; we now know
+		 * that MakeExpandedObjectReadOnly isn't needed.
+		 */
+		*op->resvalue = var->value;
+		*op->resnull = var->isnull;
+	}
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
 /*
  * plpgsql_param_eval_var		evaluation of EEOP_PARAM_CALLBACK step
  *
@@ -7957,9 +8147,10 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 	MemoryContext oldcontext;
 
 	/*
-	 * Initialize to "not simple".
+	 * Initialize to "not simple", and reset R/W optimizability.
 	 */
 	expr->expr_simple_expr = NULL;
+	expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 	expr->expr_rw_param = NULL;
 
 	/*
@@ -8164,88 +8355,133 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
 	expr->expr_simple_typmod = exprTypmod((Node *) tle_expr);
 	/* We also want to remember if it is immutable or not */
 	expr->expr_simple_mutable = contain_mutable_functions((Node *) tle_expr);
-
-	/*
-	 * Lastly, check to see if there's a possibility of optimizing a
-	 * read/write parameter.
-	 */
-	exec_check_rw_parameter(expr);
 }
 
 /*
  * exec_check_rw_parameter --- can we pass expanded object as read/write param?
  *
- * If we have an assignment like "x := array_append(x, foo)" in which the
+ * There are two separate cases in which we can optimize an update to a
+ * variable that has a read/write expanded value by letting the called
+ * expression operate directly on the expanded value.  In both cases we
+ * are considering assignments like "var := array_append(var, foo)" where
+ * the assignment target is also an input to the RHS expression.
+ *
+ * Case 1 (RWOPT_TRANSFER rule): if the variable is "local" in the sense that
+ * its declaration is not outside any BEGIN...EXCEPTION block surrounding the
+ * assignment, then we do not need to worry about preserving its value if the
+ * RHS expression throws an error.  If in addition the variable is referenced
+ * exactly once in the RHS expression, then we can optimize by converting the
+ * read/write expanded value into a transient value within the expression
+ * evaluation context, and then setting the variable's recorded value to NULL
+ * to prevent double-free attempts.  This works regardless of any other
+ * details of the RHS expression.  If the expression eventually returns that
+ * same expanded object (possibly modified) then the variable will re-acquire
+ * ownership; while if it returns something else or throws an error, the
+ * expanded object will be discarded as part of cleanup of the evaluation
+ * context.
+ *
+ * Case 2 (RWOPT_INPLACE rule): if we have a non-local assignment or if
+ * it looks like "var := array_append(var, var[1])" with multiple references
+ * to the target variable, then we can't use case 1.  Nonetheless, if the
  * top-level function is trusted not to corrupt its argument in case of an
- * error, then when x has an expanded object as value, it is safe to pass the
- * value as a read/write pointer and let the function modify the value
- * in-place.
+ * error, then when the var has an expanded object as value, it is safe to
+ * pass the value as a read/write pointer to the top-level function and let
+ * the function modify the value in-place.  (Any other references have to be
+ * passed as read-only pointers as usual.)  Only the top-level function has to
+ * be trusted, since if anything further down fails, the object hasn't been
+ * modified yet.
  *
- * This function checks for a safe expression, and sets expr->expr_rw_param
- * to the address of any Param within the expression that can be passed as
- * read/write (there can be only one); or to NULL when there is no safe Param.
+ * This function checks to see if the assignment is optimizable according
+ * to either rule, and updates expr->expr_rwopt accordingly.  In addition,
+ * it sets expr->expr_rw_param to the address of the Param within the
+ * expression that can be passed as read/write (there can be only one);
+ * or to NULL when there is no safe Param.
  *
- * Note that this mechanism intentionally applies the safety labeling to just
- * one Param; the expression could contain other Params referencing the target
- * variable, but those must still be treated as read-only.
+ * Note that this mechanism intentionally allows just one Param to emit a
+ * read/write pointer; in case 2, the expression could contain other Params
+ * referencing the target variable, but those must be treated as read-only.
  *
  * Also note that we only apply this optimization within simple expressions.
  * There's no point in it for non-simple expressions, because the
  * exec_run_select code path will flatten any expanded result anyway.
- * Also, it's safe to assume that an expr_simple_expr tree won't get copied
- * somewhere before it gets compiled, so that looking for pointer equality
- * to expr_rw_param will work for matching the target Param.  That'd be much
- * shakier in the general case.
  */
 static void
-exec_check_rw_parameter(PLpgSQL_expr *expr)
+exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 {
-	int			target_dno;
+	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
 	ListCell   *lc;
 
 	/* Assume unsafe */
+	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
 	expr->expr_rw_param = NULL;
 
-	/* Done if expression isn't an assignment source */
-	target_dno = expr->target_param;
-	if (target_dno < 0)
-		return;
+	/* Shouldn't be here for non-simple expression */
+	Assert(sexpr != NULL);
+
+	/* Param should match the expression's assignment target, too */
+	Assert(paramid == expr->target_param + 1);
 
 	/*
-	 * If target variable isn't referenced by expression, no need to look
-	 * further.
+	 * If the assignment is to a "local" variable (one whose value won't
+	 * matter anymore if expression evaluation fails), and this Param is the
+	 * only reference to that variable in the expression, then we can
+	 * unconditionally optimize using the "transfer" method.
 	 */
-	if (!bms_is_member(target_dno, expr->paramnos))
-		return;
+	if (expr->target_is_local)
+	{
+		count_param_references_context context;
 
-	/* Shouldn't be here for non-simple expression */
-	Assert(expr->expr_simple_expr != NULL);
+		/* See how many references there are, and find one of them */
+		context.paramid = paramid;
+		context.count = 0;
+		context.last_param = NULL;
+		(void) count_param_references((Node *) sexpr, &context);
+
+		/* If we're here, the expr must contain some reference to the var */
+		Assert(context.count > 0);
+
+		/* If exactly one reference, success! */
+		if (context.count == 1)
+		{
+			expr->expr_rwopt = PLPGSQL_RWOPT_TRANSFER;
+			expr->expr_rw_param = context.last_param;
+			return;
+		}
+	}
 
 	/*
+	 * Otherwise, see if we can trust the expression's top-level function to
+	 * apply the "inplace" method.
+	 *
 	 * Top level of expression must be a simple FuncExpr, OpExpr, or
-	 * SubscriptingRef, else we can't optimize.
+	 * SubscriptingRef, else we can't identify which function is relevant. But
+	 * it's okay to look through any RelabelType above that, since that can't
+	 * fail.
 	 */
-	if (IsA(expr->expr_simple_expr, FuncExpr))
+	if (IsA(sexpr, RelabelType))
+		sexpr = ((RelabelType *) sexpr)->arg;
+	if (IsA(sexpr, FuncExpr))
 	{
-		FuncExpr   *fexpr = (FuncExpr *) expr->expr_simple_expr;
+		FuncExpr   *fexpr = (FuncExpr *) sexpr;
 
 		funcid = fexpr->funcid;
 		fargs = fexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, OpExpr))
+	else if (IsA(sexpr, OpExpr))
 	{
-		OpExpr	   *opexpr = (OpExpr *) expr->expr_simple_expr;
+		OpExpr	   *opexpr = (OpExpr *) sexpr;
 
 		funcid = opexpr->opfuncid;
 		fargs = opexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, SubscriptingRef))
+	else if (IsA(sexpr, SubscriptingRef))
 	{
-		SubscriptingRef *sbsref = (SubscriptingRef *) expr->expr_simple_expr;
+		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
 		/* We only trust standard varlena arrays to be safe */
+		/* TODO: install some extensibility here */
 		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
 			F_ARRAY_SUBSCRIPT_HANDLER)
 			return;
@@ -8256,9 +8492,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) sbsref->refexpr;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8293,9 +8530,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) arg;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8303,6 +8541,35 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 	}
 }
 
+/*
+ * Count Params referencing the specified paramid, and return one of them
+ * if there are any.
+ *
+ * We actually only need to distinguish 0, 1, and N references; so we can
+ * abort the tree traversal as soon as we've found two.
+ */
+static bool
+count_param_references(Node *node, count_param_references_context *context)
+{
+	if (node == NULL)
+		return false;
+	else if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		if (param->paramkind == PARAM_EXTERN &&
+			param->paramid == context->paramid)
+		{
+			context->last_param = param;
+			if (++(context->count) > 1)
+				return true;	/* abort tree traversal */
+		}
+		return false;
+	}
+	else
+		return expression_tree_walker(node, count_param_references, context);
+}
+
 /*
  * exec_check_assignable --- is it OK to assign to the indicated datum?
  *
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 2fa6d73cab..d73996e09c 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -187,6 +187,17 @@ typedef enum PLpgSQL_resolve_option
 	PLPGSQL_RESOLVE_COLUMN,		/* prefer table column to plpgsql var */
 } PLpgSQL_resolve_option;
 
+/*
+ * Status of optimization of assignment to a read/write expanded object
+ */
+typedef enum PLpgSQL_rwopt
+{
+	PLPGSQL_RWOPT_UNKNOWN = 0,	/* applicability not determined yet */
+	PLPGSQL_RWOPT_NOPE,			/* cannot do any optimization */
+	PLPGSQL_RWOPT_TRANSFER,		/* transfer the old value into expr state */
+	PLPGSQL_RWOPT_INPLACE,		/* pass value as R/W to top-level function */
+} PLpgSQL_rwopt;
+
 
 /**********************************************************************
  * Node and structure definitions
@@ -246,11 +257,14 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * If we match a Param within expr_simple_expr to the variable identified
-	 * by target_param, that Param's address is stored in expr_rw_param; then
-	 * expression code generation will allow the value for that Param to be
-	 * passed as a read/write expanded-object pointer.
+	 * expr_rwopt tracks whether we have determined that assignment to a
+	 * read/write expanded object (stored in the target_param datum) can be
+	 * optimized by passing it to the expr as a read/write expanded-object
+	 * pointer.  If so, expr_rw_param identifies the specific Param that
+	 * should emit a read/write pointer; any others will emit read-only
+	 * pointers.
 	 */
+	PLpgSQL_rwopt expr_rwopt;	/* can we apply R/W optimization? */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4b9ff51594..4a346203dc 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -48,6 +48,15 @@ begin a.c1[1].i := 11; raise notice 'a = %, a.c1[1].i = %', a, a.c1[1].i; end$$;
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+
 create temp table onecol as select array[1,2] as f1;
 
 do $$ declare a int[];
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a2644a2e65..fcefb1231c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1873,6 +1873,7 @@ PLpgSQL_rec
 PLpgSQL_recfield
 PLpgSQL_resolve_option
 PLpgSQL_row
+PLpgSQL_rwopt
 PLpgSQL_stmt
 PLpgSQL_stmt_assert
 PLpgSQL_stmt_assign
@@ -3412,6 +3413,7 @@ core_yy_extra_type
 core_yyscan_t
 corrupt_items
 cost_qual_eval_context
+count_param_references_context
 cp_hash_func
 create_upper_paths_hook_type
 createdb_failure_params
-- 
2.43.5



  [text/x-diff] v4-0004-Allow-extension-functions-to-participate-in-in-pl.patch (17.1K, 5-v4-0004-Allow-extension-functions-to-participate-in-in-pl.patch)
  download | inline diff:
From 243b7eb49ae04f4f1680da087d3b8866db471dbb Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Thu, 30 Jan 2025 20:48:21 -0500
Subject: [PATCH v4 4/4] Allow extension functions to participate in in-place
 updates.

Commit 1dc5ebc90 allowed PL/pgSQL to perform in-place updates
of expanded-object variables that are being updated with
assignments like "x := f(x, ...)".  However this was allowed
only for a hard-wired list of functions f(), since we need to
be sure that f() will not modify the variable if it fails.
It was always envisioned that we should make that extensible,
but at the time we didn't have a good way to do so.  Since
then we've invented the idea of "support functions" to allow
attaching specialized optimization knowledge to functions,
and that is a perfect mechanism for doing this.

Hence, adjust PL/pgSQL to use a support function request instead
of hard-wired logic to decide if in-place update is safe.
Preserve the previous optimizations by creating support functions
for the three functions that were previously hard-wired.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/backend/utils/adt/array_userfuncs.c       | 61 +++++++++++++
 src/backend/utils/adt/arraysubs.c             | 34 ++++++++
 src/include/catalog/pg_proc.dat               | 20 +++--
 src/include/nodes/supportnodes.h              | 55 +++++++++++-
 src/pl/plpgsql/src/expected/plpgsql_array.out |  3 +-
 src/pl/plpgsql/src/pl_exec.c                  | 86 ++++++++-----------
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |  1 +
 src/tools/pgindent/typedefs.list              |  1 +
 8 files changed, 202 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 0b02fe3744..2aae2f8ed9 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -16,6 +16,7 @@
 #include "common/int.h"
 #include "common/pg_prng.h"
 #include "libpq/pqformat.h"
+#include "nodes/supportnodes.h"
 #include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
@@ -167,6 +168,36 @@ array_append(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_append_support()
+ *
+ * Planner support function for array_append()
+ */
+Datum
+array_append_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place appends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) linitial(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_prepend :
  *		push an element onto the front of a one-dimensional array
@@ -230,6 +261,36 @@ array_prepend(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_prepend_support()
+ *
+ * Planner support function for array_prepend()
+ */
+Datum
+array_prepend_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place prepends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) lsecond(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_cat :
  *		concatenate two nD arrays to form an nD array, or
diff --git a/src/backend/utils/adt/arraysubs.c b/src/backend/utils/adt/arraysubs.c
index 562179b379..2940fb8e8d 100644
--- a/src/backend/utils/adt/arraysubs.c
+++ b/src/backend/utils/adt/arraysubs.c
@@ -18,6 +18,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/subscripting.h"
+#include "nodes/supportnodes.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_expr.h"
 #include "utils/array.h"
@@ -575,3 +576,36 @@ raw_array_subscript_handler(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(&sbsroutines);
 }
+
+/*
+ * array_subscript_handler_support()
+ *
+ * Planner support function for array_subscript_handler()
+ */
+Datum
+array_subscript_handler_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place subscripted assignment if the refexpr is
+		 * the array being assigned to.  We don't need to worry about array
+		 * references within the refassgnexpr or the subscripts; however, if
+		 * there's no refassgnexpr then it's a fetch which there's no need to
+		 * optimize.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *refexpr = (Param *) linitial(req->args);
+
+		if (refexpr && IsA(refexpr, Param) &&
+			refexpr->paramkind == PARAM_EXTERN &&
+			refexpr->paramid == req->paramid &&
+			lsecond(req->args) != NULL)
+			ret = (Node *) refexpr;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a5..9e803d610d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1598,14 +1598,20 @@
   proname => 'cardinality', prorettype => 'int4', proargtypes => 'anyarray',
   prosrc => 'array_cardinality' },
 { oid => '378', descr => 'append element onto end of array',
-  proname => 'array_append', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_append', prosupport => 'array_append_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_append' },
+{ oid => '8680', descr => 'planner support for array_append',
+  proname => 'array_append_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_append_support' },
 { oid => '379', descr => 'prepend element onto front of array',
-  proname => 'array_prepend', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_prepend', prosupport => 'array_prepend_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatible anycompatiblearray',
   prosrc => 'array_prepend' },
+{ oid => '8681', descr => 'planner support for array_prepend',
+  proname => 'array_prepend_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_prepend_support' },
 { oid => '383',
   proname => 'array_cat', proisstrict => 'f',
   prorettype => 'anycompatiblearray',
@@ -12207,8 +12213,12 @@
 
 # subscripting support for built-in types
 { oid => '6179', descr => 'standard array subscripting support',
-  proname => 'array_subscript_handler', prorettype => 'internal',
+  proname => 'array_subscript_handler',
+  prosupport => 'array_subscript_handler_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'array_subscript_handler' },
+{ oid => '8682', descr => 'planner support for array_subscript_handler',
+  proname => 'array_subscript_handler_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_subscript_handler_support' },
 { oid => '6180', descr => 'raw array subscripting support',
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index ad5d43a2a7..9c047cc401 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -6,10 +6,10 @@
  * This file defines the API for "planner support functions", which
  * are SQL functions (normally written in C) that can be attached to
  * another "target" function to give the system additional knowledge
- * about the target function.  All the current capabilities have to do
- * with planning queries that use the target function, though it is
- * possible that future extensions will add functionality to be invoked
- * by the parser or executor.
+ * about the target function.  The name is now something of a misnomer,
+ * since some of the call sites are in the executor not the planner,
+ * but "function support function" would be a confusing name so we
+ * stick with "planner support function".
  *
  * A support function must have the SQL signature
  *		supportfn(internal) returns internal
@@ -343,4 +343,51 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+/*
+ * The ModifyInPlace request allows the support function to detect whether
+ * a call to its target function can be allowed to modify a read/write
+ * expanded object in-place.  The context is that we are considering a
+ * PL/pgSQL (or similar PL) assignment of the form "x := f(x, ...)" where
+ * the variable x is of a type that can be represented as an expanded object
+ * (see utils/expandeddatum.h).  If f() can usefully optimize by modifying
+ * the passed-in object in-place, then this request can be implemented to
+ * instruct PL/pgSQL to pass a read-write expanded pointer to the variable's
+ * value.  (Note that there is no guarantee that later calls to f() will
+ * actually do so.  If f() receives a read-only pointer, or a pointer to a
+ * non-expanded object, it must follow the usual convention of not modifying
+ * the pointed-to object.)  There are two requirements that must be met
+ * to make this safe:
+ * 1. f() must guarantee that it will not have modified the object if it
+ * fails.  Otherwise the variable's value might change unexpectedly.
+ * 2. If the other arguments to f() ("..." in the above example) contain
+ * references to x, f() must be able to cope with that; or if that's not
+ * safe, the support function must scan the other arguments to verify that
+ * there are no other references to x.  An example of the concern here is
+ * that in "arr := array_append(arr, arr[1])", if the array element type
+ * is pass-by-reference then array_append would receive a second argument
+ * that points into the array object it intends to modify.  array_append is
+ * coded to make that safe, but other functions might not be able to cope.
+ *
+ * "args" is a node tree list representing the function's arguments.
+ * One or more nodes within the node tree will be PARAM_EXTERN Params
+ * with ID "paramid", which represent the assignment target variable.
+ * (Note that such references are not necessarily at top level in the list,
+ * for example we might have "x := f(x, g(x))".  Generally it's only safe
+ * to optimize a reference that is at top level, else we're making promises
+ * about the behavior of g() as well as f().)
+ *
+ * If modify-in-place is safe, the support function should return the
+ * address of the Param node that is to return a read-write pointer.
+ * (At most one of the references is allowed to do so.)  Otherwise,
+ * return NULL.
+ */
+typedef struct SupportRequestModifyInPlace
+{
+	NodeTag		type;
+
+	Oid			funcid;			/* PG_PROC OID of the target function */
+	List	   *args;			/* Arguments to the function */
+	int			paramid;		/* ID of Param(s) representing variable */
+} SupportRequestModifyInPlace;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index e5db6d6087..4c6b3ce998 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -57,10 +57,11 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
-NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
+NOTICE:  a = {1,1,2,3,42,3,1,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 28b6c85d8d..d4377ceecb 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -29,6 +29,7 @@
 #include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
@@ -8411,7 +8412,7 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
-	ListCell   *lc;
+	Oid			prosupport;
 
 	/* Assume unsafe */
 	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
@@ -8480,64 +8481,51 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	{
 		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
-		/* We only trust standard varlena arrays to be safe */
-		/* TODO: install some extensibility here */
-		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
-			F_ARRAY_SUBSCRIPT_HANDLER)
-			return;
-
-		/* We can optimize the refexpr if it's the target, otherwise not */
-		if (sbsref->refexpr && IsA(sbsref->refexpr, Param))
-		{
-			Param	   *param = (Param *) sbsref->refexpr;
+		funcid = get_typsubscript(sbsref->refcontainertype, NULL);
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
-
-		return;
+		/*
+		 * We assume that only the refexpr and refassgnexpr (if any) are
+		 * relevant to the support function's decision.  If that turns out to
+		 * be a bad idea, we could incorporate the subscript expressions into
+		 * the fargs list somehow.
+		 */
+		fargs = list_make2(sbsref->refexpr, sbsref->refassgnexpr);
 	}
 	else
 		return;
 
 	/*
-	 * The top-level function must be one that we trust to be "safe".
-	 * Currently we hard-wire the list, but it would be very desirable to
-	 * allow extensions to mark their functions as safe ...
+	 * The top-level function must be one that can handle in-place update
+	 * safely.  We allow functions to declare their ability to do that via a
+	 * support function request.
 	 */
-	if (!(funcid == F_ARRAY_APPEND ||
-		  funcid == F_ARRAY_PREPEND))
-		return;
-
-	/*
-	 * The target variable (in the form of a Param) must appear as a direct
-	 * argument of the top-level function.  References further down in the
-	 * tree can't be optimized; but on the other hand, they don't invalidate
-	 * optimizing the top-level call, since that will be executed last.
-	 */
-	foreach(lc, fargs)
+	prosupport = get_func_support(funcid);
+	if (OidIsValid(prosupport))
 	{
-		Node	   *arg = (Node *) lfirst(lc);
+		SupportRequestModifyInPlace req;
+		Param	   *param;
 
-		if (arg && IsA(arg, Param))
-		{
-			Param	   *param = (Param *) arg;
+		req.type = T_SupportRequestModifyInPlace;
+		req.funcid = funcid;
+		req.args = fargs;
+		req.paramid = paramid;
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
+		param = (Param *)
+			DatumGetPointer(OidFunctionCall1(prosupport,
+											 PointerGetDatum(&req)));
+
+		if (param == NULL)
+			return;				/* support function fails */
+
+		/* Verify support function followed the API */
+		Assert(IsA(param, Param));
+		Assert(param->paramkind == PARAM_EXTERN);
+		Assert(param->paramid == paramid);
+
+		/* Found the Param we want to pass as read/write */
+		expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
+		expr->expr_rw_param = param;
+		return;
 	}
 }
 
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4a346203dc..da984a9941 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -53,6 +53,7 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index fcefb1231c..8e600d6bc2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2804,6 +2804,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestModifyInPlace
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.43.5



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-02 21:56  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 2 replies; 34+ messages in thread

From: Tom Lane @ 2025-02-02 21:56 UTC (permalink / raw)
  To: Pavel Borisov <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

I wrote:
> Hmm, it seemed to still apply for me.  But anyway, I needed to make
> the other changes, so here's v4.

I decided to see what would happen if we tried to avoid the code
duplication in pl_funcs.c by making some "walker" infrastructure
akin to expression_tree_walker.  While that doesn't seem useful
for the dump_xxx functions, it works very nicely for the free_xxx
functions and now for the mark_xxx ones as well.  pl_funcs.c
nets out about 400 lines shorter than in the v4 patch.  The
code coverage score for the file is still awful :-(, but that's
because we're not testing the dump_xxx functions at all.

PFA v5.  The new 0001 patch refactors the free_xxx infrastructure
to create plpgsql_statement_tree_walker(), and then in what's now
0003 we can use that instead of writing a lot of duplicate code.

			regards, tom lane



Attachments:

  [text/x-diff] v5-0001-Refactor-pl_funcs.c-to-provide-a-usage-independen.patch (17.4K, 2-v5-0001-Refactor-pl_funcs.c-to-provide-a-usage-independen.patch)
  download | inline diff:
From bec67b472a0f9b237c5ed1feffd01ee4428b0688 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Sun, 2 Feb 2025 16:05:01 -0500
Subject: [PATCH v5 1/5] Refactor pl_funcs.c to provide a usage-independent
 tree walker.

We haven't done this up to now because there was only one use-case,
namely plpgsql_free_function_memory's search for expressions to clean
up.  However an upcoming patch has another need for walking plpgsql
functions' statement trees, so let's create sharable tree-walker
infrastructure in the same style as expression_tree_walker().

This patch actually makes the code shorter, although that's
mainly down to having used a more compact coding style.  (I didn't
write a separate subroutine for each statement type, and I made
use of some newer notations like foreach_ptr.)

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_funcs.c | 582 ++++++++++++++--------------------
 1 file changed, 244 insertions(+), 338 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index 8c827fe5cc..88e25b54bc 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -334,387 +334,291 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 
 
 /**********************************************************************
- * Release memory when a PL/pgSQL function is no longer needed
+ * Support for recursing through a PL/pgSQL statement tree
  *
- * The code for recursing through the function tree is really only
- * needed to locate PLpgSQL_expr nodes, which may contain references
- * to saved SPI Plans that must be freed.  The function tree itself,
- * along with subsidiary data, is freed in one swoop by freeing the
- * function's permanent memory context.
+ * The point of this code is to encapsulate knowledge of where the
+ * sub-statements and expressions are in a statement tree, avoiding
+ * duplication of code.  The caller supplies two callbacks, one to
+ * be invoked on statements and one to be invoked on expressions.
+ * (The recursion should be started by invoking the statement callback
+ * on function->action.)  The statement callback should do any
+ * statement-type-specific action it needs, then recurse by calling
+ * plpgsql_statement_tree_walker().  The expression callback can be a
+ * no-op if no per-expression behavior is needed.
  **********************************************************************/
-static void free_stmt(PLpgSQL_stmt *stmt);
-static void free_block(PLpgSQL_stmt_block *block);
-static void free_assign(PLpgSQL_stmt_assign *stmt);
-static void free_if(PLpgSQL_stmt_if *stmt);
-static void free_case(PLpgSQL_stmt_case *stmt);
-static void free_loop(PLpgSQL_stmt_loop *stmt);
-static void free_while(PLpgSQL_stmt_while *stmt);
-static void free_fori(PLpgSQL_stmt_fori *stmt);
-static void free_fors(PLpgSQL_stmt_fors *stmt);
-static void free_forc(PLpgSQL_stmt_forc *stmt);
-static void free_foreach_a(PLpgSQL_stmt_foreach_a *stmt);
-static void free_exit(PLpgSQL_stmt_exit *stmt);
-static void free_return(PLpgSQL_stmt_return *stmt);
-static void free_return_next(PLpgSQL_stmt_return_next *stmt);
-static void free_return_query(PLpgSQL_stmt_return_query *stmt);
-static void free_raise(PLpgSQL_stmt_raise *stmt);
-static void free_assert(PLpgSQL_stmt_assert *stmt);
-static void free_execsql(PLpgSQL_stmt_execsql *stmt);
-static void free_dynexecute(PLpgSQL_stmt_dynexecute *stmt);
-static void free_dynfors(PLpgSQL_stmt_dynfors *stmt);
-static void free_getdiag(PLpgSQL_stmt_getdiag *stmt);
-static void free_open(PLpgSQL_stmt_open *stmt);
-static void free_fetch(PLpgSQL_stmt_fetch *stmt);
-static void free_close(PLpgSQL_stmt_close *stmt);
-static void free_perform(PLpgSQL_stmt_perform *stmt);
-static void free_call(PLpgSQL_stmt_call *stmt);
-static void free_commit(PLpgSQL_stmt_commit *stmt);
-static void free_rollback(PLpgSQL_stmt_rollback *stmt);
-static void free_expr(PLpgSQL_expr *expr);
+typedef void (*plpgsql_stmt_walker_callback) (PLpgSQL_stmt *stmt,
+											  void *context);
+typedef void (*plpgsql_expr_walker_callback) (PLpgSQL_expr *expr,
+											  void *context);
 
+/*
+ * As in nodeFuncs.h, we respectfully decline to support the C standard's
+ * position that a pointer to struct is incompatible with "void *".  Instead,
+ * silence related compiler warnings using casts in this macro wrapper.
+ */
+#define plpgsql_statement_tree_walker(s, sw, ew, c) \
+	plpgsql_statement_tree_walker_impl(s, (plpgsql_stmt_walker_callback) (sw), \
+									   (plpgsql_expr_walker_callback) (ew), c)
 
 static void
-free_stmt(PLpgSQL_stmt *stmt)
+plpgsql_statement_tree_walker_impl(PLpgSQL_stmt *stmt,
+								   plpgsql_stmt_walker_callback stmt_callback,
+								   plpgsql_expr_walker_callback expr_callback,
+								   void *context)
 {
+#define S_WALK(st) stmt_callback(st, context)
+#define E_WALK(ex) expr_callback(ex, context)
+#define S_LIST_WALK(lst) foreach_ptr(PLpgSQL_stmt, st, lst) S_WALK(st)
+#define E_LIST_WALK(lst) foreach_ptr(PLpgSQL_expr, ex, lst) E_WALK(ex)
+
 	switch (stmt->cmd_type)
 	{
 		case PLPGSQL_STMT_BLOCK:
-			free_block((PLpgSQL_stmt_block *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_block *bstmt = (PLpgSQL_stmt_block *) stmt;
+
+				S_LIST_WALK(bstmt->body);
+				if (bstmt->exceptions)
+				{
+					foreach_ptr(PLpgSQL_exception, exc, bstmt->exceptions->exc_list)
+					{
+						/* conditions list has no interesting sub-structure */
+						S_LIST_WALK(exc->action);
+					}
+				}
+				break;
+			}
 		case PLPGSQL_STMT_ASSIGN:
-			free_assign((PLpgSQL_stmt_assign *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_assign *astmt = (PLpgSQL_stmt_assign *) stmt;
+
+				E_WALK(astmt->expr);
+				break;
+			}
 		case PLPGSQL_STMT_IF:
-			free_if((PLpgSQL_stmt_if *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_if *ifstmt = (PLpgSQL_stmt_if *) stmt;
+
+				E_WALK(ifstmt->cond);
+				S_LIST_WALK(ifstmt->then_body);
+				foreach_ptr(PLpgSQL_if_elsif, elif, ifstmt->elsif_list)
+				{
+					E_WALK(elif->cond);
+					S_LIST_WALK(elif->stmts);
+				}
+				S_LIST_WALK(ifstmt->else_body);
+				break;
+			}
 		case PLPGSQL_STMT_CASE:
-			free_case((PLpgSQL_stmt_case *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_case *cstmt = (PLpgSQL_stmt_case *) stmt;
+
+				E_WALK(cstmt->t_expr);
+				foreach_ptr(PLpgSQL_case_when, cwt, cstmt->case_when_list)
+				{
+					E_WALK(cwt->expr);
+					S_LIST_WALK(cwt->stmts);
+				}
+				S_LIST_WALK(cstmt->else_stmts);
+				break;
+			}
 		case PLPGSQL_STMT_LOOP:
-			free_loop((PLpgSQL_stmt_loop *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_loop *lstmt = (PLpgSQL_stmt_loop *) stmt;
+
+				S_LIST_WALK(lstmt->body);
+				break;
+			}
 		case PLPGSQL_STMT_WHILE:
-			free_while((PLpgSQL_stmt_while *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_while *wstmt = (PLpgSQL_stmt_while *) stmt;
+
+				E_WALK(wstmt->cond);
+				S_LIST_WALK(wstmt->body);
+				break;
+			}
 		case PLPGSQL_STMT_FORI:
-			free_fori((PLpgSQL_stmt_fori *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_fori *fori = (PLpgSQL_stmt_fori *) stmt;
+
+				E_WALK(fori->lower);
+				E_WALK(fori->upper);
+				E_WALK(fori->step);
+				S_LIST_WALK(fori->body);
+				break;
+			}
 		case PLPGSQL_STMT_FORS:
-			free_fors((PLpgSQL_stmt_fors *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_fors *fors = (PLpgSQL_stmt_fors *) stmt;
+
+				S_LIST_WALK(fors->body);
+				E_WALK(fors->query);
+				break;
+			}
 		case PLPGSQL_STMT_FORC:
-			free_forc((PLpgSQL_stmt_forc *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_forc *forc = (PLpgSQL_stmt_forc *) stmt;
+
+				S_LIST_WALK(forc->body);
+				E_WALK(forc->argquery);
+				break;
+			}
 		case PLPGSQL_STMT_FOREACH_A:
-			free_foreach_a((PLpgSQL_stmt_foreach_a *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_foreach_a *fstmt = (PLpgSQL_stmt_foreach_a *) stmt;
+
+				E_WALK(fstmt->expr);
+				S_LIST_WALK(fstmt->body);
+				break;
+			}
 		case PLPGSQL_STMT_EXIT:
-			free_exit((PLpgSQL_stmt_exit *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_exit *estmt = (PLpgSQL_stmt_exit *) stmt;
+
+				E_WALK(estmt->cond);
+				break;
+			}
 		case PLPGSQL_STMT_RETURN:
-			free_return((PLpgSQL_stmt_return *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_return *rstmt = (PLpgSQL_stmt_return *) stmt;
+
+				E_WALK(rstmt->expr);
+				break;
+			}
 		case PLPGSQL_STMT_RETURN_NEXT:
-			free_return_next((PLpgSQL_stmt_return_next *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_return_next *rstmt = (PLpgSQL_stmt_return_next *) stmt;
+
+				E_WALK(rstmt->expr);
+				break;
+			}
 		case PLPGSQL_STMT_RETURN_QUERY:
-			free_return_query((PLpgSQL_stmt_return_query *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_return_query *rstmt = (PLpgSQL_stmt_return_query *) stmt;
+
+				E_WALK(rstmt->query);
+				E_WALK(rstmt->dynquery);
+				E_LIST_WALK(rstmt->params);
+				break;
+			}
 		case PLPGSQL_STMT_RAISE:
-			free_raise((PLpgSQL_stmt_raise *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_raise *rstmt = (PLpgSQL_stmt_raise *) stmt;
+
+				E_LIST_WALK(rstmt->params);
+				foreach_ptr(PLpgSQL_raise_option, opt, rstmt->options)
+				{
+					E_WALK(opt->expr);
+				}
+				break;
+			}
 		case PLPGSQL_STMT_ASSERT:
-			free_assert((PLpgSQL_stmt_assert *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_assert *astmt = (PLpgSQL_stmt_assert *) stmt;
+
+				E_WALK(astmt->cond);
+				E_WALK(astmt->message);
+				break;
+			}
 		case PLPGSQL_STMT_EXECSQL:
-			free_execsql((PLpgSQL_stmt_execsql *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_execsql *xstmt = (PLpgSQL_stmt_execsql *) stmt;
+
+				E_WALK(xstmt->sqlstmt);
+				break;
+			}
 		case PLPGSQL_STMT_DYNEXECUTE:
-			free_dynexecute((PLpgSQL_stmt_dynexecute *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_dynexecute *dstmt = (PLpgSQL_stmt_dynexecute *) stmt;
+
+				E_WALK(dstmt->query);
+				E_LIST_WALK(dstmt->params);
+				break;
+			}
 		case PLPGSQL_STMT_DYNFORS:
-			free_dynfors((PLpgSQL_stmt_dynfors *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_dynfors *dstmt = (PLpgSQL_stmt_dynfors *) stmt;
+
+				S_LIST_WALK(dstmt->body);
+				E_WALK(dstmt->query);
+				E_LIST_WALK(dstmt->params);
+				break;
+			}
 		case PLPGSQL_STMT_GETDIAG:
-			free_getdiag((PLpgSQL_stmt_getdiag *) stmt);
-			break;
+			{
+				/* no interesting sub-structure */
+				break;
+			}
 		case PLPGSQL_STMT_OPEN:
-			free_open((PLpgSQL_stmt_open *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_open *ostmt = (PLpgSQL_stmt_open *) stmt;
+
+				E_WALK(ostmt->argquery);
+				E_WALK(ostmt->query);
+				E_WALK(ostmt->dynquery);
+				E_LIST_WALK(ostmt->params);
+				break;
+			}
 		case PLPGSQL_STMT_FETCH:
-			free_fetch((PLpgSQL_stmt_fetch *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_fetch *fstmt = (PLpgSQL_stmt_fetch *) stmt;
+
+				E_WALK(fstmt->expr);
+				break;
+			}
 		case PLPGSQL_STMT_CLOSE:
-			free_close((PLpgSQL_stmt_close *) stmt);
-			break;
+			{
+				/* no interesting sub-structure */
+				break;
+			}
 		case PLPGSQL_STMT_PERFORM:
-			free_perform((PLpgSQL_stmt_perform *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_perform *pstmt = (PLpgSQL_stmt_perform *) stmt;
+
+				E_WALK(pstmt->expr);
+				break;
+			}
 		case PLPGSQL_STMT_CALL:
-			free_call((PLpgSQL_stmt_call *) stmt);
-			break;
+			{
+				PLpgSQL_stmt_call *cstmt = (PLpgSQL_stmt_call *) stmt;
+
+				E_WALK(cstmt->expr);
+				break;
+			}
 		case PLPGSQL_STMT_COMMIT:
-			free_commit((PLpgSQL_stmt_commit *) stmt);
-			break;
 		case PLPGSQL_STMT_ROLLBACK:
-			free_rollback((PLpgSQL_stmt_rollback *) stmt);
-			break;
+			{
+				/* no interesting sub-structure */
+				break;
+			}
 		default:
 			elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type);
 			break;
 	}
 }
 
-static void
-free_stmts(List *stmts)
-{
-	ListCell   *s;
-
-	foreach(s, stmts)
-	{
-		free_stmt((PLpgSQL_stmt *) lfirst(s));
-	}
-}
-
-static void
-free_block(PLpgSQL_stmt_block *block)
-{
-	free_stmts(block->body);
-	if (block->exceptions)
-	{
-		ListCell   *e;
-
-		foreach(e, block->exceptions->exc_list)
-		{
-			PLpgSQL_exception *exc = (PLpgSQL_exception *) lfirst(e);
-
-			free_stmts(exc->action);
-		}
-	}
-}
-
-static void
-free_assign(PLpgSQL_stmt_assign *stmt)
-{
-	free_expr(stmt->expr);
-}
-
-static void
-free_if(PLpgSQL_stmt_if *stmt)
-{
-	ListCell   *l;
-
-	free_expr(stmt->cond);
-	free_stmts(stmt->then_body);
-	foreach(l, stmt->elsif_list)
-	{
-		PLpgSQL_if_elsif *elif = (PLpgSQL_if_elsif *) lfirst(l);
-
-		free_expr(elif->cond);
-		free_stmts(elif->stmts);
-	}
-	free_stmts(stmt->else_body);
-}
-
-static void
-free_case(PLpgSQL_stmt_case *stmt)
-{
-	ListCell   *l;
-
-	free_expr(stmt->t_expr);
-	foreach(l, stmt->case_when_list)
-	{
-		PLpgSQL_case_when *cwt = (PLpgSQL_case_when *) lfirst(l);
-
-		free_expr(cwt->expr);
-		free_stmts(cwt->stmts);
-	}
-	free_stmts(stmt->else_stmts);
-}
-
-static void
-free_loop(PLpgSQL_stmt_loop *stmt)
-{
-	free_stmts(stmt->body);
-}
-
-static void
-free_while(PLpgSQL_stmt_while *stmt)
-{
-	free_expr(stmt->cond);
-	free_stmts(stmt->body);
-}
-
-static void
-free_fori(PLpgSQL_stmt_fori *stmt)
-{
-	free_expr(stmt->lower);
-	free_expr(stmt->upper);
-	free_expr(stmt->step);
-	free_stmts(stmt->body);
-}
-
-static void
-free_fors(PLpgSQL_stmt_fors *stmt)
-{
-	free_stmts(stmt->body);
-	free_expr(stmt->query);
-}
-
-static void
-free_forc(PLpgSQL_stmt_forc *stmt)
-{
-	free_stmts(stmt->body);
-	free_expr(stmt->argquery);
-}
-
-static void
-free_foreach_a(PLpgSQL_stmt_foreach_a *stmt)
-{
-	free_expr(stmt->expr);
-	free_stmts(stmt->body);
-}
-
-static void
-free_open(PLpgSQL_stmt_open *stmt)
-{
-	ListCell   *lc;
-
-	free_expr(stmt->argquery);
-	free_expr(stmt->query);
-	free_expr(stmt->dynquery);
-	foreach(lc, stmt->params)
-	{
-		free_expr((PLpgSQL_expr *) lfirst(lc));
-	}
-}
-
-static void
-free_fetch(PLpgSQL_stmt_fetch *stmt)
-{
-	free_expr(stmt->expr);
-}
 
-static void
-free_close(PLpgSQL_stmt_close *stmt)
-{
-}
-
-static void
-free_perform(PLpgSQL_stmt_perform *stmt)
-{
-	free_expr(stmt->expr);
-}
-
-static void
-free_call(PLpgSQL_stmt_call *stmt)
-{
-	free_expr(stmt->expr);
-}
-
-static void
-free_commit(PLpgSQL_stmt_commit *stmt)
-{
-}
-
-static void
-free_rollback(PLpgSQL_stmt_rollback *stmt)
-{
-}
-
-static void
-free_exit(PLpgSQL_stmt_exit *stmt)
-{
-	free_expr(stmt->cond);
-}
-
-static void
-free_return(PLpgSQL_stmt_return *stmt)
-{
-	free_expr(stmt->expr);
-}
-
-static void
-free_return_next(PLpgSQL_stmt_return_next *stmt)
-{
-	free_expr(stmt->expr);
-}
-
-static void
-free_return_query(PLpgSQL_stmt_return_query *stmt)
-{
-	ListCell   *lc;
-
-	free_expr(stmt->query);
-	free_expr(stmt->dynquery);
-	foreach(lc, stmt->params)
-	{
-		free_expr((PLpgSQL_expr *) lfirst(lc));
-	}
-}
-
-static void
-free_raise(PLpgSQL_stmt_raise *stmt)
-{
-	ListCell   *lc;
-
-	foreach(lc, stmt->params)
-	{
-		free_expr((PLpgSQL_expr *) lfirst(lc));
-	}
-	foreach(lc, stmt->options)
-	{
-		PLpgSQL_raise_option *opt = (PLpgSQL_raise_option *) lfirst(lc);
-
-		free_expr(opt->expr);
-	}
-}
-
-static void
-free_assert(PLpgSQL_stmt_assert *stmt)
-{
-	free_expr(stmt->cond);
-	free_expr(stmt->message);
-}
-
-static void
-free_execsql(PLpgSQL_stmt_execsql *stmt)
-{
-	free_expr(stmt->sqlstmt);
-}
-
-static void
-free_dynexecute(PLpgSQL_stmt_dynexecute *stmt)
-{
-	ListCell   *lc;
-
-	free_expr(stmt->query);
-	foreach(lc, stmt->params)
-	{
-		free_expr((PLpgSQL_expr *) lfirst(lc));
-	}
-}
-
-static void
-free_dynfors(PLpgSQL_stmt_dynfors *stmt)
-{
-	ListCell   *lc;
-
-	free_stmts(stmt->body);
-	free_expr(stmt->query);
-	foreach(lc, stmt->params)
-	{
-		free_expr((PLpgSQL_expr *) lfirst(lc));
-	}
-}
+/**********************************************************************
+ * Release memory when a PL/pgSQL function is no longer needed
+ *
+ * This code only needs to deal with cleaning up PLpgSQL_expr nodes,
+ * which may contain references to saved SPI Plans that must be freed.
+ * The function tree itself, along with subsidiary data, is freed in
+ * one swoop by freeing the function's permanent memory context.
+ **********************************************************************/
+static void free_stmt(PLpgSQL_stmt *stmt, void *context);
+static void free_expr(PLpgSQL_expr *expr, void *context);
 
 static void
-free_getdiag(PLpgSQL_stmt_getdiag *stmt)
+free_stmt(PLpgSQL_stmt *stmt, void *context)
 {
+	if (stmt == NULL)
+		return;
+	plpgsql_statement_tree_walker(stmt, free_stmt, free_expr, NULL);
 }
 
 static void
-free_expr(PLpgSQL_expr *expr)
+free_expr(PLpgSQL_expr *expr, void *context)
 {
 	if (expr && expr->plan)
 	{
@@ -743,8 +647,8 @@ plpgsql_free_function_memory(PLpgSQL_function *func)
 				{
 					PLpgSQL_var *var = (PLpgSQL_var *) d;
 
-					free_expr(var->default_val);
-					free_expr(var->cursor_explicit_expr);
+					free_expr(var->default_val, NULL);
+					free_expr(var->cursor_explicit_expr, NULL);
 				}
 				break;
 			case PLPGSQL_DTYPE_ROW:
@@ -753,7 +657,7 @@ plpgsql_free_function_memory(PLpgSQL_function *func)
 				{
 					PLpgSQL_rec *rec = (PLpgSQL_rec *) d;
 
-					free_expr(rec->default_val);
+					free_expr(rec->default_val, NULL);
 				}
 				break;
 			case PLPGSQL_DTYPE_RECFIELD:
@@ -765,8 +669,7 @@ plpgsql_free_function_memory(PLpgSQL_function *func)
 	func->ndatums = 0;
 
 	/* Release plans in statement tree */
-	if (func->action)
-		free_block(func->action);
+	free_stmt((PLpgSQL_stmt *) func->action, NULL);
 	func->action = NULL;
 
 	/*
@@ -782,6 +685,9 @@ plpgsql_free_function_memory(PLpgSQL_function *func)
 
 /**********************************************************************
  * Debug functions for analyzing the compiled code
+ *
+ * Sadly, there doesn't seem to be any way to let plpgsql_statement_tree_walker
+ * bear some of the burden for this.
  **********************************************************************/
 static int	dump_indent;
 
-- 
2.43.5



  [text/x-diff] v5-0002-Preliminary-refactoring.patch (9.9K, 3-v5-0002-Preliminary-refactoring.patch)
  download | inline diff:
From 9ffc27937b8d21336f20cbe7490a0d5e2f244a84 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Sun, 2 Feb 2025 16:05:38 -0500
Subject: [PATCH v5 2/5] Preliminary refactoring.

This short and boring patch simply moves the responsibility for
initializing PLpgSQL_expr.target_param into plpgsql parsing,
rather than doing it at first execution of the expr as before.
This doesn't save anything in terms of runtime, since the work was
trivial and done only once per expr anyway.  But it makes the info
available during parsing, which will be useful for the next step.

Likewise set PLpgSQL_expr.func during parsing.  According to the
comments, this was once impossible; but it's certainly possible
since we invented the plpgsql_curr_compile variable.  Again, this
saves little runtime, but it seems far cleaner conceptually.

While at it, I reordered stuff in struct PLpgSQL_expr to make it
clearer which fields are filled when, and merged some duplicative
code in pl_gram.y.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_exec.c | 27 ---------------
 src/pl/plpgsql/src/pl_gram.y | 65 ++++++++++++++++++++++++------------
 src/pl/plpgsql/src/plpgsql.h | 31 +++++++++--------
 3 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 35cda55cf9..fec1811ae1 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4174,12 +4174,6 @@ exec_prepare_plan(PLpgSQL_execstate *estate,
 	SPIPlanPtr	plan;
 	SPIPrepareOptions options;
 
-	/*
-	 * The grammar can't conveniently set expr->func while building the parse
-	 * tree, so make sure it's set before parser hooks need it.
-	 */
-	expr->func = estate->func;
-
 	/*
 	 * Generate and save the plan
 	 */
@@ -5016,21 +5010,7 @@ exec_assign_expr(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 	 * If first time through, create a plan for this expression.
 	 */
 	if (expr->plan == NULL)
-	{
-		/*
-		 * Mark the expression as being an assignment source, if target is a
-		 * simple variable.  (This is a bit messy, but it seems cleaner than
-		 * modifying the API of exec_prepare_plan for the purpose.  We need to
-		 * stash the target dno into the expr anyway, so that it will be
-		 * available if we have to replan.)
-		 */
-		if (target->dtype == PLPGSQL_DTYPE_VAR)
-			expr->target_param = target->dno;
-		else
-			expr->target_param = -1;	/* should be that already */
-
 		exec_prepare_plan(estate, expr, 0);
-	}
 
 	value = exec_eval_expr(estate, expr, &isnull, &valtype, &valtypmod);
 	exec_assign_value(estate, target, value, isnull, valtype, valtypmod);
@@ -6282,13 +6262,6 @@ setup_param_list(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 		 * that they are interrupting an active use of parameters.
 		 */
 		paramLI->parserSetupArg = expr;
-
-		/*
-		 * Also make sure this is set before parser hooks need it.  There is
-		 * no need to save and restore, since the value is always correct once
-		 * set.  (Should be set already, but let's be sure.)
-		 */
-		expr->func = estate->func;
 	}
 	else
 	{
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 64d2c362bf..f55aefb100 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -61,6 +61,10 @@ static	bool			tok_is_keyword(int token, union YYSTYPE *lval,
 static	void			word_is_not_variable(PLword *word, int location, yyscan_t yyscanner);
 static	void			cword_is_not_variable(PLcword *cword, int location, yyscan_t yyscanner);
 static	void			current_token_is_not_variable(int tok, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yyscan_t yyscanner);
+static	PLpgSQL_expr	*make_plpgsql_expr(const char *query,
+										   RawParseMode parsemode);
+static	void			mark_expr_as_assignment_source(PLpgSQL_expr *expr,
+													   PLpgSQL_datum *target);
 static	PLpgSQL_expr	*read_sql_construct(int until,
 											int until2,
 											int until3,
@@ -536,6 +540,10 @@ decl_statement	: decl_varname decl_const decl_datatype decl_collate decl_notnull
 									 errmsg("variable \"%s\" must have a default value, since it's declared NOT NULL",
 											var->refname),
 									 parser_errposition(@5)));
+
+						if (var->default_val != NULL)
+							mark_expr_as_assignment_source(var->default_val,
+														   (PLpgSQL_datum *) var);
 					}
 				| decl_varname K_ALIAS K_FOR decl_aliasitem ';'
 					{
@@ -996,6 +1004,7 @@ stmt_assign		: T_DATUM
 													   false, true,
 													   NULL, NULL,
 													   &yylval, &yylloc, yyscanner);
+						mark_expr_as_assignment_source(new->expr, $1.datum);
 
 						$$ = (PLpgSQL_stmt *) new;
 					}
@@ -2651,6 +2660,38 @@ current_token_is_not_variable(int tok, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yysca
 		yyerror(yyllocp, NULL, yyscanner, "syntax error");
 }
 
+/* Convenience routine to construct a PLpgSQL_expr struct */
+static PLpgSQL_expr *
+make_plpgsql_expr(const char *query,
+				  RawParseMode parsemode)
+{
+	PLpgSQL_expr *expr = palloc0(sizeof(PLpgSQL_expr));
+
+	expr->query = pstrdup(query);
+	expr->parseMode = parsemode;
+	expr->func = plpgsql_curr_compile;
+	expr->ns = plpgsql_ns_top();
+	/* might get changed later during parsing: */
+	expr->target_param = -1;
+	/* other fields are left as zeroes until first execution */
+	return expr;
+}
+
+/* Mark a PLpgSQL_expr as being the source of an assignment to target */
+static void
+mark_expr_as_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
+{
+	/*
+	 * Mark the expression as being an assignment source, if target is a
+	 * simple variable.  We don't currently support optimized assignments to
+	 * other DTYPEs, so no need to mark in other cases.
+	 */
+	if (target->dtype == PLPGSQL_DTYPE_VAR)
+		expr->target_param = target->dno;
+	else
+		expr->target_param = -1;	/* should be that already */
+}
+
 /* Convenience routine to read an expression with one possible terminator */
 static PLpgSQL_expr *
 read_sql_expression(int until, const char *expected, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yyscan_t yyscanner)
@@ -2794,13 +2835,7 @@ read_sql_construct(int until,
 	 */
 	plpgsql_append_source_text(&ds, startlocation, endlocation, yyscanner);
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = parsemode;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, parsemode);
 	pfree(ds.data);
 
 	if (valid_sql)
@@ -3122,13 +3157,7 @@ make_execsql_stmt(int firsttoken, int location, PLword *word, YYSTYPE *yylvalp,
 	while (ds.len > 0 && scanner_isspace(ds.data[ds.len - 1]))
 		ds.data[--ds.len] = '\0';
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_DEFAULT;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_DEFAULT);
 	pfree(ds.data);
 
 	check_sql_expr(expr->query, expr->parseMode, location, yyscanner);
@@ -4006,13 +4035,7 @@ read_cursor_args(PLpgSQL_var *cursor, int until, YYSTYPE *yylvalp, YYLTYPE *yyll
 			appendStringInfoString(&ds, ", ");
 	}
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_PLPGSQL_EXPR;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_PLPGSQL_EXPR);
 	pfree(ds.data);
 
 	/* Next we'd better find the until token */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 441df5354e..b0052167ee 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -219,14 +219,22 @@ typedef struct PLpgSQL_expr
 {
 	char	   *query;			/* query string, verbatim from function body */
 	RawParseMode parseMode;		/* raw_parser() mode to use */
-	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
-	Bitmapset  *paramnos;		/* all dnos referenced by this query */
+	struct PLpgSQL_function *func;	/* function containing this expr */
+	struct PLpgSQL_nsitem *ns;	/* namespace chain visible to this expr */
 
-	/* function containing this expr (not set until we first parse query) */
-	struct PLpgSQL_function *func;
+	/*
+	 * These fields are used to help optimize assignments to expanded-datum
+	 * variables.  If this expression is the source of an assignment to a
+	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 */
+	int			target_param;	/* dno of assign target, or -1 if none */
 
-	/* namespace chain visible to this expr */
-	struct PLpgSQL_nsitem *ns;
+	/*
+	 * Fields above are set during plpgsql parsing.  Remaining fields are left
+	 * as zeroes/NULLs until we first parse/plan the query.
+	 */
+	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
+	Bitmapset  *paramnos;		/* all dnos referenced by this query */
 
 	/* fields for "simple expression" fast-path execution: */
 	Expr	   *expr_simple_expr;	/* NULL means not a simple expr */
@@ -235,14 +243,11 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * These fields are used to optimize assignments to expanded-datum
-	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno; else it's -1.
-	 * If we match a Param within expr_simple_expr to such a variable, that
-	 * Param's address is stored in expr_rw_param; then expression code
-	 * generation will allow the value for that Param to be passed read/write.
+	 * If we match a Param within expr_simple_expr to the variable identified
+	 * by target_param, that Param's address is stored in expr_rw_param; then
+	 * expression code generation will allow the value for that Param to be
+	 * passed as a read/write expanded-object pointer.
 	 */
-	int			target_param;	/* dno of assign target, or -1 if none */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
-- 
2.43.5



  [text/x-diff] v5-0003-Detect-whether-plpgsql-assignment-targets-are-loc.patch (10.2K, 4-v5-0003-Detect-whether-plpgsql-assignment-targets-are-loc.patch)
  download | inline diff:
From 6f39d4631c211109904627accd64da2e3102bef2 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Sun, 2 Feb 2025 16:40:31 -0500
Subject: [PATCH v5 3/5] Detect whether plpgsql assignment targets are "local"
 variables.

Mark whether the target of a potentially optimizable assignment
is "local", in the sense of being declared inside any exception
block that could trap an error thrown from the assignment.
(This implies that we needn't preserve the variable's value
in case of an error.  This patch doesn't do anything with the
knowledge, but the next one will.)

Normally, this requires a post-parsing scan of the function's
parse tree, since we don't know while parsing a BEGIN ...
construct whether we will find EXCEPTION at its end.  However,
if there are no BEGIN ... EXCEPTION blocks in the function at
all, then all assignments are local, even those to variables
representing function arguments.  We optimize that common case
by initializing the target_is_local flags to "true", and fixing
them up with a post-scan only if we found EXCEPTION.

Note that variables' default-value expressions are never interesting
for expanded-variable optimization, since they couldn't contain a
reference to the target variable anyway.  But the code is set up
to compute their target_param and target_is_local correctly anyway,
for consistency and in case someone thinks of a use for that data.

I added a bit of plpgsql_dumptree support to help verify that
this code sets the flags as expected.  I'm not set on keeping
that, but I do want to keep the addition of a plpgsql_dumptree
call in plpgsql_compile_inline.  It's at best an oversight that
"#option dump" doesn't work in a DO block.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_comp.c  | 12 +++++
 src/pl/plpgsql/src/pl_funcs.c | 88 +++++++++++++++++++++++++++++++++++
 src/pl/plpgsql/src/pl_gram.y  | 15 ++++++
 src/pl/plpgsql/src/plpgsql.h  |  7 ++-
 4 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index a2de0880fb..f36a244140 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -371,6 +371,7 @@ do_compile(FunctionCallInfo fcinfo,
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	/*
 	 * Initialize the compiler, particularly the namespace stack.  The
@@ -811,6 +812,9 @@ do_compile(FunctionCallInfo fcinfo,
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
 	/* Debug dump for completed functions */
 	if (plpgsql_DumpExecTree)
 		plpgsql_dumptree(function);
@@ -906,6 +910,7 @@ plpgsql_compile_inline(char *proc_source)
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	plpgsql_ns_init();
 	plpgsql_ns_push(func_name, PLPGSQL_LABEL_BLOCK);
@@ -962,6 +967,13 @@ plpgsql_compile_inline(char *proc_source)
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
+	/* Debug dump for completed functions */
+	if (plpgsql_DumpExecTree)
+		plpgsql_dumptree(function);
+
 	/*
 	 * Pop the error context stack
 	 */
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index 88e25b54bc..6b5394fc5f 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -598,6 +598,91 @@ plpgsql_statement_tree_walker_impl(PLpgSQL_stmt *stmt,
 }
 
 
+/**********************************************************************
+ * Mark assignment source expressions that have local target variables,
+ * that is, the target variable is declared within the exception block
+ * most closely containing the assignment itself.  (Such target variables
+ * need not be preserved if the assignment's source expression raises an
+ * error, since the variable will no longer be accessible afterwards.
+ * Detecting this allows better optimization.)
+ *
+ * This code need not be called if the plpgsql function contains no exception
+ * blocks, because mark_expr_as_assignment_source will have set all the flags
+ * to true already.  Also, we need not reconsider default-value expressions
+ * for variables, because variable declarations are necessarily within the
+ * nearest exception block.  (In DECLARE ... BEGIN ... EXCEPTION ... END, the
+ * variable initializations are done before entering the exception scope.)
+ *
+ * Within the recursion, local_dnos is a Bitmapset of dnos of variables
+ * known to be declared within the current exception level.
+ **********************************************************************/
+static void mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos);
+static void mark_expr(PLpgSQL_expr *expr, Bitmapset *local_dnos);
+
+static void
+mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos)
+{
+	if (stmt == NULL)
+		return;
+	if (stmt->cmd_type == PLPGSQL_STMT_BLOCK)
+	{
+		PLpgSQL_stmt_block *block = (PLpgSQL_stmt_block *) stmt;
+
+		if (block->exceptions)
+		{
+			/*
+			 * The block creates a new exception scope, so variables declared
+			 * at outer levels are nonlocal.  For that matter, so are any
+			 * variables declared in the block's DECLARE section.  Hence, we
+			 * must pass down empty local_dnos.
+			 */
+			plpgsql_statement_tree_walker(stmt, mark_stmt, mark_expr, NULL);
+		}
+		else
+		{
+			/*
+			 * Otherwise, the block does not create a new exception scope, and
+			 * any variables it declares can also be considered local within
+			 * it.  Note that only initializable datum types (VAR, REC) are
+			 * included in initvarnos; but that's sufficient for our purposes.
+			 */
+			local_dnos = bms_copy(local_dnos);
+			for (int i = 0; i < block->n_initvars; i++)
+				local_dnos = bms_add_member(local_dnos, block->initvarnos[i]);
+			plpgsql_statement_tree_walker(stmt, mark_stmt, mark_expr,
+										  local_dnos);
+			bms_free(local_dnos);
+		}
+	}
+	else
+		plpgsql_statement_tree_walker(stmt, mark_stmt, mark_expr, local_dnos);
+}
+
+static void
+mark_expr(PLpgSQL_expr *expr, Bitmapset *local_dnos)
+{
+	/*
+	 * If this expression has an assignment target, check whether the target
+	 * is local, and mark the expression accordingly.
+	 */
+	if (expr && expr->target_param >= 0)
+		expr->target_is_local = bms_is_member(expr->target_param, local_dnos);
+}
+
+void
+plpgsql_mark_local_assignment_targets(PLpgSQL_function *func)
+{
+	Bitmapset  *local_dnos;
+
+	/* Function parameters can be treated as local targets at outer level */
+	local_dnos = NULL;
+	for (int i = 0; i < func->fn_nargs; i++)
+		local_dnos = bms_add_member(local_dnos, func->fn_argvarnos[i]);
+	mark_stmt((PLpgSQL_stmt *) func->action, local_dnos);
+	bms_free(local_dnos);
+}
+
+
 /**********************************************************************
  * Release memory when a PL/pgSQL function is no longer needed
  *
@@ -1500,6 +1585,9 @@ static void
 dump_expr(PLpgSQL_expr *expr)
 {
 	printf("'%s'", expr->query);
+	if (expr->target_param >= 0)
+		printf(" target %d%s", expr->target_param,
+			   expr->target_is_local ? " (local)" : "");
 }
 
 void
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index f55aefb100..8048e040f8 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -2328,6 +2328,8 @@ exception_sect	:
 						PLpgSQL_exception_block *new = palloc(sizeof(PLpgSQL_exception_block));
 						PLpgSQL_variable *var;
 
+						plpgsql_curr_compile->has_exception_block = true;
+
 						var = plpgsql_build_variable("sqlstate", lineno,
 													 plpgsql_build_datatype(TEXTOID,
 																			-1,
@@ -2673,6 +2675,7 @@ make_plpgsql_expr(const char *query,
 	expr->ns = plpgsql_ns_top();
 	/* might get changed later during parsing: */
 	expr->target_param = -1;
+	expr->target_is_local = false;
 	/* other fields are left as zeroes until first execution */
 	return expr;
 }
@@ -2687,9 +2690,21 @@ mark_expr_as_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
 	 * other DTYPEs, so no need to mark in other cases.
 	 */
 	if (target->dtype == PLPGSQL_DTYPE_VAR)
+	{
 		expr->target_param = target->dno;
+
+		/*
+		 * For now, assume the target is local to the nearest enclosing
+		 * exception block.  That's correct if the function contains no
+		 * exception blocks; otherwise we'll update this later.
+		 */
+		expr->target_is_local = true;
+	}
 	else
+	{
 		expr->target_param = -1;	/* should be that already */
+		expr->target_is_local = false; /* ditto */
+	}
 }
 
 /* Convenience routine to read an expression with one possible terminator */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index b0052167ee..2fa6d73cab 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -225,9 +225,12 @@ typedef struct PLpgSQL_expr
 	/*
 	 * These fields are used to help optimize assignments to expanded-datum
 	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 * simple variable, target_param holds that variable's dno (else it's -1),
+	 * and target_is_local indicates whether the target is declared inside the
+	 * closest exception block containing the assignment.
 	 */
 	int			target_param;	/* dno of assign target, or -1 if none */
+	bool		target_is_local;	/* is it within nearest exception block? */
 
 	/*
 	 * Fields above are set during plpgsql parsing.  Remaining fields are left
@@ -1014,6 +1017,7 @@ typedef struct PLpgSQL_function
 	/* data derived while parsing body */
 	unsigned int nstatements;	/* counter for assigning stmtids */
 	bool		requires_procedure_resowner;	/* contains CALL or DO? */
+	bool		has_exception_block;	/* contains BEGIN...EXCEPTION? */
 
 	/* these fields change when the function is used */
 	struct PLpgSQL_execstate *cur_estate;
@@ -1312,6 +1316,7 @@ extern PLpgSQL_nsitem *plpgsql_ns_find_nearest_loop(PLpgSQL_nsitem *ns_cur);
  */
 extern PGDLLEXPORT const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
 extern const char *plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind);
+extern void plpgsql_mark_local_assignment_targets(PLpgSQL_function *func);
 extern void plpgsql_free_function_memory(PLpgSQL_function *func);
 extern void plpgsql_dumptree(PLpgSQL_function *func);
 
-- 
2.43.5



  [text/x-diff] v5-0004-Implement-new-optimization-rule-for-updates-of-ex.patch (26.8K, 5-v5-0004-Implement-new-optimization-rule-for-updates-of-ex.patch)
  download | inline diff:
From eb5f318fb43e61c958a35802c214820f7ff7711d Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Sun, 2 Feb 2025 16:41:45 -0500
Subject: [PATCH v5 4/5] Implement new optimization rule for updates of
 expanded variables.

If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/include/executor/execExpr.h               |   1 +
 src/pl/plpgsql/src/expected/plpgsql_array.out |   9 +
 src/pl/plpgsql/src/pl_exec.c                  | 383 +++++++++++++++---
 src/pl/plpgsql/src/plpgsql.h                  |  22 +-
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |   9 +
 src/tools/pgindent/typedefs.list              |   2 +
 6 files changed, 364 insertions(+), 62 deletions(-)

diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 51bd35dcb0..191d8fe34d 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -425,6 +425,7 @@ typedef struct ExprEvalStep
 		{
 			ExecEvalSubroutine paramfunc;	/* add-on evaluation subroutine */
 			void	   *paramarg;	/* private data for same */
+			void	   *paramarg2;	/* more private data for same */
 			int			paramid;	/* numeric ID for parameter */
 			Oid			paramtype;	/* OID of parameter's datatype */
 		}			cparam;
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index ad60e0e8be..e5db6d6087 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -52,6 +52,15 @@ NOTICE:  a = ("{""(,11)""}",), a.c1[1].i = 11
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 NOTICE:  a = {1,2,3}
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index fec1811ae1..28b6c85d8d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -251,6 +251,15 @@ static HTAB *shared_cast_hash = NULL;
 	else \
 		Assert(rc == PLPGSQL_RC_OK)
 
+/* State struct for count_param_references */
+typedef struct count_param_references_context
+{
+	int			paramid;
+	int			count;
+	Param	   *last_param;
+} count_param_references_context;
+
+
 /************************************************************
  * Local function forward declarations
  ************************************************************/
@@ -336,7 +345,9 @@ static void exec_prepare_plan(PLpgSQL_execstate *estate,
 static void exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr);
 static bool exec_is_simple_query(PLpgSQL_expr *expr);
 static void exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan);
-static void exec_check_rw_parameter(PLpgSQL_expr *expr);
+static void exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid);
+static bool count_param_references(Node *node,
+								   count_param_references_context *context);
 static void exec_check_assignable(PLpgSQL_execstate *estate, int dno);
 static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
 								  PLpgSQL_expr *expr,
@@ -384,6 +395,10 @@ static ParamExternData *plpgsql_param_fetch(ParamListInfo params,
 static void plpgsql_param_compile(ParamListInfo params, Param *param,
 								  ExprState *state,
 								  Datum *resv, bool *resnull);
+static void plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+										 ExprContext *econtext);
+static void plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+											ExprContext *econtext);
 static void plpgsql_param_eval_var(ExprState *state, ExprEvalStep *op,
 								   ExprContext *econtext);
 static void plpgsql_param_eval_var_ro(ExprState *state, ExprEvalStep *op,
@@ -6078,10 +6093,13 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 
 		/*
 		 * Reset to "not simple" to leave sane state (with no dangling
-		 * pointers) in case we fail while replanning.  expr_simple_plansource
-		 * can be left alone however, as that cannot move.
+		 * pointers) in case we fail while replanning.  We'll need to
+		 * re-determine simplicity and R/W optimizability anyway, since those
+		 * could change with the new plan.  expr_simple_plansource can be left
+		 * alone however, as that cannot move.
 		 */
 		expr->expr_simple_expr = NULL;
+		expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 		expr->expr_rw_param = NULL;
 		expr->expr_simple_plan = NULL;
 		expr->expr_simple_plan_lxid = InvalidLocalTransactionId;
@@ -6439,16 +6457,27 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	scratch.resnull = resnull;
 
 	/*
-	 * Select appropriate eval function.  It seems worth special-casing
-	 * DTYPE_VAR and DTYPE_RECFIELD for performance.  Also, we can determine
-	 * in advance whether MakeExpandedObjectReadOnly() will be required.
-	 * Currently, only VAR/PROMISE and REC datums could contain read/write
-	 * expanded objects.
+	 * Select appropriate eval function.
+	 *
+	 * First, if this Param references the same varlena-type DTYPE_VAR datum
+	 * that is the target of the assignment containing this simple expression,
+	 * then it's possible we will be able to optimize handling of R/W expanded
+	 * datums.  We don't want to do the work needed to determine that unless
+	 * we actually see a R/W expanded datum at runtime, so install a checking
+	 * function that will figure that out when needed.
+	 *
+	 * Otherwise, it seems worth special-casing DTYPE_VAR and DTYPE_RECFIELD
+	 * for performance.  Also, we can determine in advance whether
+	 * MakeExpandedObjectReadOnly() will be required.  Currently, only
+	 * VAR/PROMISE and REC datums could contain read/write expanded objects.
 	 */
 	if (datum->dtype == PLPGSQL_DTYPE_VAR)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		bool		isvarlena = (((PLpgSQL_var *) datum)->datatype->typlen == -1);
+
+		if (isvarlena && dno == expr->target_param && expr->expr_simple_expr)
+			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_check;
+		else if (isvarlena)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var;
@@ -6457,14 +6486,12 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_recfield;
 	else if (datum->dtype == PLPGSQL_DTYPE_PROMISE)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		if (((PLpgSQL_var *) datum)->datatype->typlen == -1)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
 	}
-	else if (datum->dtype == PLPGSQL_DTYPE_REC &&
-			 param != expr->expr_rw_param)
+	else if (datum->dtype == PLPGSQL_DTYPE_REC)
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 	else
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
@@ -6473,14 +6500,177 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	 * Note: it's tempting to use paramarg to store the estate pointer and
 	 * thereby save an indirection or two in the eval functions.  But that
 	 * doesn't work because the compiled expression might be used with
-	 * different estates for the same PL/pgSQL function.
+	 * different estates for the same PL/pgSQL function.  Instead, store
+	 * pointers to the PLpgSQL_expr as well as this specific Param, to support
+	 * plpgsql_param_eval_var_check().
 	 */
-	scratch.d.cparam.paramarg = NULL;
+	scratch.d.cparam.paramarg = expr;
+	scratch.d.cparam.paramarg2 = param;
 	scratch.d.cparam.paramid = param->paramid;
 	scratch.d.cparam.paramtype = param->paramtype;
 	ExprEvalPushStep(state, &scratch);
 }
 
+/*
+ * plpgsql_param_eval_var_check		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we may need to determine the applicability of a read/write optimization,
+ * but we've not done that yet.  The work to determine applicability will
+ * be done at most once (per construction of the PL/pgSQL function's cache
+ * entry) when we first see that the target variable's old value is a R/W
+ * expanded object.  If we never do see that, nothing is lost: the amount
+ * of work done by this function in that case is just about the same as
+ * what would be done by plpgsql_param_eval_var_ro, which is what we'd
+ * have used otherwise.
+ */
+static void
+plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+							 ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, it's time to
+	 * decide whether/how to optimize the assignment.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		PLpgSQL_expr *expr = (PLpgSQL_expr *) op->d.cparam.paramarg;
+		Param	   *param = (Param *) op->d.cparam.paramarg2;
+
+		/*
+		 * We might have already figured this out while evaluating some other
+		 * Param referencing the same variable, so check expr_rwopt first.
+		 */
+		if (expr->expr_rwopt == PLPGSQL_RWOPT_UNKNOWN)
+			exec_check_rw_parameter(expr, op->d.cparam.paramid);
+
+		/*
+		 * Update the callback pointer to match what we decided to do, so that
+		 * this function will not be called again.  Then pass off this
+		 * execution to the newly-selected function.
+		 */
+		switch (expr->expr_rwopt)
+		{
+			case PLPGSQL_RWOPT_UNKNOWN:
+				Assert(false);
+				break;
+			case PLPGSQL_RWOPT_NOPE:
+				/* Force the value to read-only in all future executions */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+				plpgsql_param_eval_var_ro(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_TRANSFER:
+				/* There can be only one matching Param in this case */
+				Assert(param == expr->expr_rw_param);
+				/* When the value is read/write, transfer to exec context */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_transfer;
+				plpgsql_param_eval_var_transfer(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_INPLACE:
+				if (param == expr->expr_rw_param)
+				{
+					/* When the value is read/write, deliver it as-is */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var;
+					plpgsql_param_eval_var(state, op, econtext);
+				}
+				else
+				{
+					/* Not the optimizable reference, so force to read-only */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+					plpgsql_param_eval_var_ro(state, op, econtext);
+				}
+				break;
+		}
+		return;
+	}
+
+	/*
+	 * Otherwise, continue to postpone that decision, and execute an inlined
+	 * version of exec_eval_datum().  Although this value could potentially
+	 * need MakeExpandedObjectReadOnly, we know it doesn't right now.
+	 */
+	*op->resvalue = var->value;
+	*op->resnull = var->isnull;
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
+/*
+ * plpgsql_param_eval_var_transfer		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we have determined that a read/write expanded value can be handed off
+ * into execution of the expression (and then possibly returned to our
+ * function's ownership afterwards).  We have to test though, because the
+ * variable might not contain a read/write expanded value during this
+ * execution.
+ */
+static void
+plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+								ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, transfer its
+	 * ownership into the expression execution context, then drop our own
+	 * reference to the value by setting the variable to NULL.  That'll be
+	 * overwritten (perhaps with this same object) when control comes back
+	 * from the expression.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		*op->resvalue = TransferExpandedObject(var->value,
+											   get_eval_mcontext(estate));
+		*op->resnull = false;
+
+		var->value = (Datum) 0;
+		var->isnull = true;
+		var->freeval = false;
+	}
+	else
+	{
+		/*
+		 * Otherwise we can pass the variable's value directly; we now know
+		 * that MakeExpandedObjectReadOnly isn't needed.
+		 */
+		*op->resvalue = var->value;
+		*op->resnull = var->isnull;
+	}
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
 /*
  * plpgsql_param_eval_var		evaluation of EEOP_PARAM_CALLBACK step
  *
@@ -7957,9 +8147,10 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 	MemoryContext oldcontext;
 
 	/*
-	 * Initialize to "not simple".
+	 * Initialize to "not simple", and reset R/W optimizability.
 	 */
 	expr->expr_simple_expr = NULL;
+	expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 	expr->expr_rw_param = NULL;
 
 	/*
@@ -8164,88 +8355,133 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
 	expr->expr_simple_typmod = exprTypmod((Node *) tle_expr);
 	/* We also want to remember if it is immutable or not */
 	expr->expr_simple_mutable = contain_mutable_functions((Node *) tle_expr);
-
-	/*
-	 * Lastly, check to see if there's a possibility of optimizing a
-	 * read/write parameter.
-	 */
-	exec_check_rw_parameter(expr);
 }
 
 /*
  * exec_check_rw_parameter --- can we pass expanded object as read/write param?
  *
- * If we have an assignment like "x := array_append(x, foo)" in which the
+ * There are two separate cases in which we can optimize an update to a
+ * variable that has a read/write expanded value by letting the called
+ * expression operate directly on the expanded value.  In both cases we
+ * are considering assignments like "var := array_append(var, foo)" where
+ * the assignment target is also an input to the RHS expression.
+ *
+ * Case 1 (RWOPT_TRANSFER rule): if the variable is "local" in the sense that
+ * its declaration is not outside any BEGIN...EXCEPTION block surrounding the
+ * assignment, then we do not need to worry about preserving its value if the
+ * RHS expression throws an error.  If in addition the variable is referenced
+ * exactly once in the RHS expression, then we can optimize by converting the
+ * read/write expanded value into a transient value within the expression
+ * evaluation context, and then setting the variable's recorded value to NULL
+ * to prevent double-free attempts.  This works regardless of any other
+ * details of the RHS expression.  If the expression eventually returns that
+ * same expanded object (possibly modified) then the variable will re-acquire
+ * ownership; while if it returns something else or throws an error, the
+ * expanded object will be discarded as part of cleanup of the evaluation
+ * context.
+ *
+ * Case 2 (RWOPT_INPLACE rule): if we have a non-local assignment or if
+ * it looks like "var := array_append(var, var[1])" with multiple references
+ * to the target variable, then we can't use case 1.  Nonetheless, if the
  * top-level function is trusted not to corrupt its argument in case of an
- * error, then when x has an expanded object as value, it is safe to pass the
- * value as a read/write pointer and let the function modify the value
- * in-place.
+ * error, then when the var has an expanded object as value, it is safe to
+ * pass the value as a read/write pointer to the top-level function and let
+ * the function modify the value in-place.  (Any other references have to be
+ * passed as read-only pointers as usual.)  Only the top-level function has to
+ * be trusted, since if anything further down fails, the object hasn't been
+ * modified yet.
  *
- * This function checks for a safe expression, and sets expr->expr_rw_param
- * to the address of any Param within the expression that can be passed as
- * read/write (there can be only one); or to NULL when there is no safe Param.
+ * This function checks to see if the assignment is optimizable according
+ * to either rule, and updates expr->expr_rwopt accordingly.  In addition,
+ * it sets expr->expr_rw_param to the address of the Param within the
+ * expression that can be passed as read/write (there can be only one);
+ * or to NULL when there is no safe Param.
  *
- * Note that this mechanism intentionally applies the safety labeling to just
- * one Param; the expression could contain other Params referencing the target
- * variable, but those must still be treated as read-only.
+ * Note that this mechanism intentionally allows just one Param to emit a
+ * read/write pointer; in case 2, the expression could contain other Params
+ * referencing the target variable, but those must be treated as read-only.
  *
  * Also note that we only apply this optimization within simple expressions.
  * There's no point in it for non-simple expressions, because the
  * exec_run_select code path will flatten any expanded result anyway.
- * Also, it's safe to assume that an expr_simple_expr tree won't get copied
- * somewhere before it gets compiled, so that looking for pointer equality
- * to expr_rw_param will work for matching the target Param.  That'd be much
- * shakier in the general case.
  */
 static void
-exec_check_rw_parameter(PLpgSQL_expr *expr)
+exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 {
-	int			target_dno;
+	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
 	ListCell   *lc;
 
 	/* Assume unsafe */
+	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
 	expr->expr_rw_param = NULL;
 
-	/* Done if expression isn't an assignment source */
-	target_dno = expr->target_param;
-	if (target_dno < 0)
-		return;
+	/* Shouldn't be here for non-simple expression */
+	Assert(sexpr != NULL);
+
+	/* Param should match the expression's assignment target, too */
+	Assert(paramid == expr->target_param + 1);
 
 	/*
-	 * If target variable isn't referenced by expression, no need to look
-	 * further.
+	 * If the assignment is to a "local" variable (one whose value won't
+	 * matter anymore if expression evaluation fails), and this Param is the
+	 * only reference to that variable in the expression, then we can
+	 * unconditionally optimize using the "transfer" method.
 	 */
-	if (!bms_is_member(target_dno, expr->paramnos))
-		return;
+	if (expr->target_is_local)
+	{
+		count_param_references_context context;
 
-	/* Shouldn't be here for non-simple expression */
-	Assert(expr->expr_simple_expr != NULL);
+		/* See how many references there are, and find one of them */
+		context.paramid = paramid;
+		context.count = 0;
+		context.last_param = NULL;
+		(void) count_param_references((Node *) sexpr, &context);
+
+		/* If we're here, the expr must contain some reference to the var */
+		Assert(context.count > 0);
+
+		/* If exactly one reference, success! */
+		if (context.count == 1)
+		{
+			expr->expr_rwopt = PLPGSQL_RWOPT_TRANSFER;
+			expr->expr_rw_param = context.last_param;
+			return;
+		}
+	}
 
 	/*
+	 * Otherwise, see if we can trust the expression's top-level function to
+	 * apply the "inplace" method.
+	 *
 	 * Top level of expression must be a simple FuncExpr, OpExpr, or
-	 * SubscriptingRef, else we can't optimize.
+	 * SubscriptingRef, else we can't identify which function is relevant. But
+	 * it's okay to look through any RelabelType above that, since that can't
+	 * fail.
 	 */
-	if (IsA(expr->expr_simple_expr, FuncExpr))
+	if (IsA(sexpr, RelabelType))
+		sexpr = ((RelabelType *) sexpr)->arg;
+	if (IsA(sexpr, FuncExpr))
 	{
-		FuncExpr   *fexpr = (FuncExpr *) expr->expr_simple_expr;
+		FuncExpr   *fexpr = (FuncExpr *) sexpr;
 
 		funcid = fexpr->funcid;
 		fargs = fexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, OpExpr))
+	else if (IsA(sexpr, OpExpr))
 	{
-		OpExpr	   *opexpr = (OpExpr *) expr->expr_simple_expr;
+		OpExpr	   *opexpr = (OpExpr *) sexpr;
 
 		funcid = opexpr->opfuncid;
 		fargs = opexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, SubscriptingRef))
+	else if (IsA(sexpr, SubscriptingRef))
 	{
-		SubscriptingRef *sbsref = (SubscriptingRef *) expr->expr_simple_expr;
+		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
 		/* We only trust standard varlena arrays to be safe */
+		/* TODO: install some extensibility here */
 		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
 			F_ARRAY_SUBSCRIPT_HANDLER)
 			return;
@@ -8256,9 +8492,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) sbsref->refexpr;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8293,9 +8530,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) arg;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8303,6 +8541,35 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 	}
 }
 
+/*
+ * Count Params referencing the specified paramid, and return one of them
+ * if there are any.
+ *
+ * We actually only need to distinguish 0, 1, and N references; so we can
+ * abort the tree traversal as soon as we've found two.
+ */
+static bool
+count_param_references(Node *node, count_param_references_context *context)
+{
+	if (node == NULL)
+		return false;
+	else if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		if (param->paramkind == PARAM_EXTERN &&
+			param->paramid == context->paramid)
+		{
+			context->last_param = param;
+			if (++(context->count) > 1)
+				return true;	/* abort tree traversal */
+		}
+		return false;
+	}
+	else
+		return expression_tree_walker(node, count_param_references, context);
+}
+
 /*
  * exec_check_assignable --- is it OK to assign to the indicated datum?
  *
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 2fa6d73cab..d73996e09c 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -187,6 +187,17 @@ typedef enum PLpgSQL_resolve_option
 	PLPGSQL_RESOLVE_COLUMN,		/* prefer table column to plpgsql var */
 } PLpgSQL_resolve_option;
 
+/*
+ * Status of optimization of assignment to a read/write expanded object
+ */
+typedef enum PLpgSQL_rwopt
+{
+	PLPGSQL_RWOPT_UNKNOWN = 0,	/* applicability not determined yet */
+	PLPGSQL_RWOPT_NOPE,			/* cannot do any optimization */
+	PLPGSQL_RWOPT_TRANSFER,		/* transfer the old value into expr state */
+	PLPGSQL_RWOPT_INPLACE,		/* pass value as R/W to top-level function */
+} PLpgSQL_rwopt;
+
 
 /**********************************************************************
  * Node and structure definitions
@@ -246,11 +257,14 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * If we match a Param within expr_simple_expr to the variable identified
-	 * by target_param, that Param's address is stored in expr_rw_param; then
-	 * expression code generation will allow the value for that Param to be
-	 * passed as a read/write expanded-object pointer.
+	 * expr_rwopt tracks whether we have determined that assignment to a
+	 * read/write expanded object (stored in the target_param datum) can be
+	 * optimized by passing it to the expr as a read/write expanded-object
+	 * pointer.  If so, expr_rw_param identifies the specific Param that
+	 * should emit a read/write pointer; any others will emit read-only
+	 * pointers.
 	 */
+	PLpgSQL_rwopt expr_rwopt;	/* can we apply R/W optimization? */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4b9ff51594..4a346203dc 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -48,6 +48,15 @@ begin a.c1[1].i := 11; raise notice 'a = %, a.c1[1].i = %', a, a.c1[1].i; end$$;
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+
 create temp table onecol as select array[1,2] as f1;
 
 do $$ declare a int[];
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9a3bee93de..4d4bf62b6e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1873,6 +1873,7 @@ PLpgSQL_rec
 PLpgSQL_recfield
 PLpgSQL_resolve_option
 PLpgSQL_row
+PLpgSQL_rwopt
 PLpgSQL_stmt
 PLpgSQL_stmt_assert
 PLpgSQL_stmt_assign
@@ -3414,6 +3415,7 @@ core_yy_extra_type
 core_yyscan_t
 corrupt_items
 cost_qual_eval_context
+count_param_references_context
 cp_hash_func
 create_upper_paths_hook_type
 createdb_failure_params
-- 
2.43.5



  [text/x-diff] v5-0005-Allow-extension-functions-to-participate-in-in-pl.patch (17.1K, 6-v5-0005-Allow-extension-functions-to-participate-in-in-pl.patch)
  download | inline diff:
From 02955d5e539888f0da1b92ce273f9ea049361d4a Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Sun, 2 Feb 2025 16:42:12 -0500
Subject: [PATCH v5 5/5] Allow extension functions to participate in in-place
 updates.

Commit 1dc5ebc90 allowed PL/pgSQL to perform in-place updates
of expanded-object variables that are being updated with
assignments like "x := f(x, ...)".  However this was allowed
only for a hard-wired list of functions f(), since we need to
be sure that f() will not modify the variable if it fails.
It was always envisioned that we should make that extensible,
but at the time we didn't have a good way to do so.  Since
then we've invented the idea of "support functions" to allow
attaching specialized optimization knowledge to functions,
and that is a perfect mechanism for doing this.

Hence, adjust PL/pgSQL to use a support function request instead
of hard-wired logic to decide if in-place update is safe.
Preserve the previous optimizations by creating support functions
for the three functions that were previously hard-wired.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/backend/utils/adt/array_userfuncs.c       | 61 +++++++++++++
 src/backend/utils/adt/arraysubs.c             | 34 ++++++++
 src/include/catalog/pg_proc.dat               | 20 +++--
 src/include/nodes/supportnodes.h              | 55 +++++++++++-
 src/pl/plpgsql/src/expected/plpgsql_array.out |  3 +-
 src/pl/plpgsql/src/pl_exec.c                  | 86 ++++++++-----------
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |  1 +
 src/tools/pgindent/typedefs.list              |  1 +
 8 files changed, 202 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 0b02fe3744..2aae2f8ed9 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -16,6 +16,7 @@
 #include "common/int.h"
 #include "common/pg_prng.h"
 #include "libpq/pqformat.h"
+#include "nodes/supportnodes.h"
 #include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
@@ -167,6 +168,36 @@ array_append(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_append_support()
+ *
+ * Planner support function for array_append()
+ */
+Datum
+array_append_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place appends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) linitial(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_prepend :
  *		push an element onto the front of a one-dimensional array
@@ -230,6 +261,36 @@ array_prepend(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+/*
+ * array_prepend_support()
+ *
+ * Planner support function for array_prepend()
+ */
+Datum
+array_prepend_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place prepends if the function's array argument
+		 * is the array being assigned to.  We don't need to worry about array
+		 * references within the other argument.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *arg = (Param *) lsecond(req->args);
+
+		if (arg && IsA(arg, Param) &&
+			arg->paramkind == PARAM_EXTERN &&
+			arg->paramid == req->paramid)
+			ret = (Node *) arg;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*-----------------------------------------------------------------------------
  * array_cat :
  *		concatenate two nD arrays to form an nD array, or
diff --git a/src/backend/utils/adt/arraysubs.c b/src/backend/utils/adt/arraysubs.c
index 562179b379..2940fb8e8d 100644
--- a/src/backend/utils/adt/arraysubs.c
+++ b/src/backend/utils/adt/arraysubs.c
@@ -18,6 +18,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/subscripting.h"
+#include "nodes/supportnodes.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_expr.h"
 #include "utils/array.h"
@@ -575,3 +576,36 @@ raw_array_subscript_handler(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(&sbsroutines);
 }
+
+/*
+ * array_subscript_handler_support()
+ *
+ * Planner support function for array_subscript_handler()
+ */
+Datum
+array_subscript_handler_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestModifyInPlace))
+	{
+		/*
+		 * We can optimize in-place subscripted assignment if the refexpr is
+		 * the array being assigned to.  We don't need to worry about array
+		 * references within the refassgnexpr or the subscripts; however, if
+		 * there's no refassgnexpr then it's a fetch which there's no need to
+		 * optimize.
+		 */
+		SupportRequestModifyInPlace *req = (SupportRequestModifyInPlace *) rawreq;
+		Param	   *refexpr = (Param *) linitial(req->args);
+
+		if (refexpr && IsA(refexpr, Param) &&
+			refexpr->paramkind == PARAM_EXTERN &&
+			refexpr->paramid == req->paramid &&
+			lsecond(req->args) != NULL)
+			ret = (Node *) refexpr;
+	}
+
+	PG_RETURN_POINTER(ret);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a5..9e803d610d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1598,14 +1598,20 @@
   proname => 'cardinality', prorettype => 'int4', proargtypes => 'anyarray',
   prosrc => 'array_cardinality' },
 { oid => '378', descr => 'append element onto end of array',
-  proname => 'array_append', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_append', prosupport => 'array_append_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_append' },
+{ oid => '8680', descr => 'planner support for array_append',
+  proname => 'array_append_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_append_support' },
 { oid => '379', descr => 'prepend element onto front of array',
-  proname => 'array_prepend', proisstrict => 'f',
-  prorettype => 'anycompatiblearray',
+  proname => 'array_prepend', prosupport => 'array_prepend_support',
+  proisstrict => 'f', prorettype => 'anycompatiblearray',
   proargtypes => 'anycompatible anycompatiblearray',
   prosrc => 'array_prepend' },
+{ oid => '8681', descr => 'planner support for array_prepend',
+  proname => 'array_prepend_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_prepend_support' },
 { oid => '383',
   proname => 'array_cat', proisstrict => 'f',
   prorettype => 'anycompatiblearray',
@@ -12207,8 +12213,12 @@
 
 # subscripting support for built-in types
 { oid => '6179', descr => 'standard array subscripting support',
-  proname => 'array_subscript_handler', prorettype => 'internal',
+  proname => 'array_subscript_handler',
+  prosupport => 'array_subscript_handler_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'array_subscript_handler' },
+{ oid => '8682', descr => 'planner support for array_subscript_handler',
+  proname => 'array_subscript_handler_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'array_subscript_handler_support' },
 { oid => '6180', descr => 'raw array subscripting support',
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index ad5d43a2a7..9c047cc401 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -6,10 +6,10 @@
  * This file defines the API for "planner support functions", which
  * are SQL functions (normally written in C) that can be attached to
  * another "target" function to give the system additional knowledge
- * about the target function.  All the current capabilities have to do
- * with planning queries that use the target function, though it is
- * possible that future extensions will add functionality to be invoked
- * by the parser or executor.
+ * about the target function.  The name is now something of a misnomer,
+ * since some of the call sites are in the executor not the planner,
+ * but "function support function" would be a confusing name so we
+ * stick with "planner support function".
  *
  * A support function must have the SQL signature
  *		supportfn(internal) returns internal
@@ -343,4 +343,51 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+/*
+ * The ModifyInPlace request allows the support function to detect whether
+ * a call to its target function can be allowed to modify a read/write
+ * expanded object in-place.  The context is that we are considering a
+ * PL/pgSQL (or similar PL) assignment of the form "x := f(x, ...)" where
+ * the variable x is of a type that can be represented as an expanded object
+ * (see utils/expandeddatum.h).  If f() can usefully optimize by modifying
+ * the passed-in object in-place, then this request can be implemented to
+ * instruct PL/pgSQL to pass a read-write expanded pointer to the variable's
+ * value.  (Note that there is no guarantee that later calls to f() will
+ * actually do so.  If f() receives a read-only pointer, or a pointer to a
+ * non-expanded object, it must follow the usual convention of not modifying
+ * the pointed-to object.)  There are two requirements that must be met
+ * to make this safe:
+ * 1. f() must guarantee that it will not have modified the object if it
+ * fails.  Otherwise the variable's value might change unexpectedly.
+ * 2. If the other arguments to f() ("..." in the above example) contain
+ * references to x, f() must be able to cope with that; or if that's not
+ * safe, the support function must scan the other arguments to verify that
+ * there are no other references to x.  An example of the concern here is
+ * that in "arr := array_append(arr, arr[1])", if the array element type
+ * is pass-by-reference then array_append would receive a second argument
+ * that points into the array object it intends to modify.  array_append is
+ * coded to make that safe, but other functions might not be able to cope.
+ *
+ * "args" is a node tree list representing the function's arguments.
+ * One or more nodes within the node tree will be PARAM_EXTERN Params
+ * with ID "paramid", which represent the assignment target variable.
+ * (Note that such references are not necessarily at top level in the list,
+ * for example we might have "x := f(x, g(x))".  Generally it's only safe
+ * to optimize a reference that is at top level, else we're making promises
+ * about the behavior of g() as well as f().)
+ *
+ * If modify-in-place is safe, the support function should return the
+ * address of the Param node that is to return a read-write pointer.
+ * (At most one of the references is allowed to do so.)  Otherwise,
+ * return NULL.
+ */
+typedef struct SupportRequestModifyInPlace
+{
+	NodeTag		type;
+
+	Oid			funcid;			/* PG_PROC OID of the target function */
+	List	   *args;			/* Arguments to the function */
+	int			paramid;		/* ID of Param(s) representing variable */
+} SupportRequestModifyInPlace;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index e5db6d6087..4c6b3ce998 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -57,10 +57,11 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
-NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
+NOTICE:  a = {1,1,2,3,42,3,1,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 28b6c85d8d..d4377ceecb 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -29,6 +29,7 @@
 #include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
@@ -8411,7 +8412,7 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
-	ListCell   *lc;
+	Oid			prosupport;
 
 	/* Assume unsafe */
 	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
@@ -8480,64 +8481,51 @@ exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 	{
 		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
-		/* We only trust standard varlena arrays to be safe */
-		/* TODO: install some extensibility here */
-		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
-			F_ARRAY_SUBSCRIPT_HANDLER)
-			return;
-
-		/* We can optimize the refexpr if it's the target, otherwise not */
-		if (sbsref->refexpr && IsA(sbsref->refexpr, Param))
-		{
-			Param	   *param = (Param *) sbsref->refexpr;
+		funcid = get_typsubscript(sbsref->refcontainertype, NULL);
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
-
-		return;
+		/*
+		 * We assume that only the refexpr and refassgnexpr (if any) are
+		 * relevant to the support function's decision.  If that turns out to
+		 * be a bad idea, we could incorporate the subscript expressions into
+		 * the fargs list somehow.
+		 */
+		fargs = list_make2(sbsref->refexpr, sbsref->refassgnexpr);
 	}
 	else
 		return;
 
 	/*
-	 * The top-level function must be one that we trust to be "safe".
-	 * Currently we hard-wire the list, but it would be very desirable to
-	 * allow extensions to mark their functions as safe ...
+	 * The top-level function must be one that can handle in-place update
+	 * safely.  We allow functions to declare their ability to do that via a
+	 * support function request.
 	 */
-	if (!(funcid == F_ARRAY_APPEND ||
-		  funcid == F_ARRAY_PREPEND))
-		return;
-
-	/*
-	 * The target variable (in the form of a Param) must appear as a direct
-	 * argument of the top-level function.  References further down in the
-	 * tree can't be optimized; but on the other hand, they don't invalidate
-	 * optimizing the top-level call, since that will be executed last.
-	 */
-	foreach(lc, fargs)
+	prosupport = get_func_support(funcid);
+	if (OidIsValid(prosupport))
 	{
-		Node	   *arg = (Node *) lfirst(lc);
+		SupportRequestModifyInPlace req;
+		Param	   *param;
 
-		if (arg && IsA(arg, Param))
-		{
-			Param	   *param = (Param *) arg;
+		req.type = T_SupportRequestModifyInPlace;
+		req.funcid = funcid;
+		req.args = fargs;
+		req.paramid = paramid;
 
-			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == paramid)
-			{
-				/* Found the Param we want to pass as read/write */
-				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
-				expr->expr_rw_param = param;
-				return;
-			}
-		}
+		param = (Param *)
+			DatumGetPointer(OidFunctionCall1(prosupport,
+											 PointerGetDatum(&req)));
+
+		if (param == NULL)
+			return;				/* support function fails */
+
+		/* Verify support function followed the API */
+		Assert(IsA(param, Param));
+		Assert(param->paramkind == PARAM_EXTERN);
+		Assert(param->paramid == paramid);
+
+		/* Found the Param we want to pass as read/write */
+		expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
+		expr->expr_rw_param = param;
+		return;
 	}
 }
 
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4a346203dc..da984a9941 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -53,6 +53,7 @@ begin
   -- test scenarios for optimization of updates of R/W expanded objects
   a := array_append(a, 42);  -- optimizable using "transfer" method
   a := a || a[3];  -- optimizable using "inplace" method
+  a := a[1] || a;  -- ditto, but let's test array_prepend
   a := a || a;     -- not optimizable
   raise notice 'a = %', a;
 end$$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4d4bf62b6e..62c63e3728 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2804,6 +2804,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestModifyInPlace
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.43.5



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 03:02  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 34+ messages in thread

From: Michel Pelletier @ 2025-02-03 03:02 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Borisov <[email protected]>; Andrey Borodin <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

On Sun, Feb 2, 2025 at 1:57 PM Tom Lane <[email protected]> wrote:

> I wrote:
> > Hmm, it seemed to still apply for me.  But anyway, I needed to make
> > the other changes, so here's v4.
>
> PFA v5.  The new 0001 patch refactors the free_xxx infrastructure
> to create plpgsql_statement_tree_walker(), and then in what's now
> 0003 we can use that instead of writing a lot of duplicate code.
>

Thanks Tom!  These patches apply for me and all my tests and benchmarks are
still good.

-Michel


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 09:19  Andrey Borodin <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 34+ messages in thread

From: Andrey Borodin @ 2025-02-03 09:19 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Borisov <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]



> On 3 Feb 2025, at 02:56, Tom Lane <[email protected]> wrote:
> 
> I decided to see what would happen if we tried to avoid the code
> duplication in pl_funcs.c by making some "walker" infrastructure
> akin to expression_tree_walker.  While that doesn't seem useful
> for the dump_xxx functions, it works very nicely for the free_xxx
> functions and now for the mark_xxx ones as well.  pl_funcs.c
> nets out about 400 lines shorter than in the v4 patch.  The
> code coverage score for the file is still awful :-(, but that's
> because we're not testing the dump_xxx functions at all.
> 
> PFA v5.  The new 0001 patch refactors the free_xxx infrastructure
> to create plpgsql_statement_tree_walker(), and then in what's now
> 0003 we can use that instead of writing a lot of duplicate code.


Pre-preliminary refactoring looks good to me, as the rest of the patch set.

(Well, maybe paramarg2 resonates a bit, just from similarity with varchar2)

ecpg tests seem to fail on Windows[0], but looks like it's not related to this thread.


Best regards, Andrey Borodin.

[0] https://cirrus-ci.com/task/4835794898124800





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 11:42  Pavel Borisov <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Pavel Borisov @ 2025-02-03 11:42 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Tom Lane <[email protected]>; Andrey Borodin <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Hi, Tom

On Mon, 3 Feb 2025 at 07:02, Michel Pelletier
<[email protected]> wrote:
>
> On Sun, Feb 2, 2025 at 1:57 PM Tom Lane <[email protected]> wrote:
>>
>> I wrote:
>> > Hmm, it seemed to still apply for me.  But anyway, I needed to make
>> > the other changes, so here's v4.
>>
>> PFA v5.  The new 0001 patch refactors the free_xxx infrastructure
>> to create plpgsql_statement_tree_walker(), and then in what's now
>> 0003 we can use that instead of writing a lot of duplicate code.
>
>
> Thanks Tom!  These patches apply for me and all my tests and benchmarks are still good.

I've looked at the patches v4 and v5.
v5 logic of patch 0003 is much more understandable with refactoring
free_* functions to a walker. So I think it's much better than v4 even
regarding the principle have not changed.

Using support functions in 0005 complicates things. But I think it's
justified by extensibility and benchmarks demonstrated by Michel above
in the thread.

Overall patch to me looks well-structured, beneficial for performance
and extensibility and it looks good to me.

Minor notes on the patches:

If dump_* functions could use the newly added walker, the code would
look better. I suppose the main complication is that dump_* functions
contain a lot of per-statement prints/formatting. So maybe a way to
implement this is to put these statements into the existing tree
walker i.e. plpgsql_statement_tree_walker_impl() and add an argument
bool dump_debug into it. So in effect called with dump_debug=false
plpgsql_statement_tree_walker_impl() would walk silent, and with
dump_debug=false it would walk and print what is supposed to be
printed currently in dump_* functions. Maybe there are other problems
besides this?

For exec_check_rw_parameter():

I think renaming expr->expr_simple_expr to sexpr saves few bytes but
doesn't makes anything simpler, so if possible I'd prefer using just
expr->expr_simple_expr with necessary casts. Furtermore in this
function mostly we use cast results fexpr, opexpr and sbsref of
expr->expr_simple_expr that already has separate names.

Transferring target param as int paramid looks completely ok. But we
have unconditional checking Assert(paramid == expr->target_param + 1),
so it looks as a redundant split as of now. Do we plan a true split
and removal of this assert in the future?

Thanks for creating and working on this patch!
Regards,
Pavel Borisov
Supabase






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 17:36  Tom Lane <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-02-03 17:36 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Pavel Borisov <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Andrey Borodin <[email protected]> writes:
> (Well, maybe paramarg2 resonates a bit, just from similarity with varchar2)

I'm not wedded to that name; do you have a better idea?

Another idea could be to make it an array:

-            void       *paramarg;    /* private data for same */
+            void       *paramarg[2]; /* private data for same */

That would require touching other code using that field, but there
probably isn't much --- at least within our own tree, plpgsql itself
is the only user of paramarg.  Still, possibly breaking code that
didn't need to be broken doesn't seem like an improvement.

> ecpg tests seem to fail on Windows[0], but looks like it's not related to this thread.

Yeah, known problem with Meson dependencies[1]; it's breaking
pretty much all the cfbot's windows builds right now, although
maybe there's a timing issue that allows some to pass.

			regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAGECzQSvM3iSDmjF%2B%3DKof5an6jN8UbkP_4cKKT9w6GZavmb5yQ%4...






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 17:53  Tom Lane <[email protected]>
  parent: Pavel Borisov <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-02-03 17:53 UTC (permalink / raw)
  To: Pavel Borisov <[email protected]>; +Cc: Michel Pelletier <[email protected]>; Andrey Borodin <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Pavel Borisov <[email protected]> writes:
> Minor notes on the patches:

> If dump_* functions could use the newly added walker, the code would
> look better. I suppose the main complication is that dump_* functions
> contain a lot of per-statement prints/formatting. So maybe a way to
> implement this is to put these statements into the existing tree
> walker i.e. plpgsql_statement_tree_walker_impl() and add an argument
> bool dump_debug into it. So in effect called with dump_debug=false
> plpgsql_statement_tree_walker_impl() would walk silent, and with
> dump_debug=false it would walk and print what is supposed to be
> printed currently in dump_* functions. Maybe there are other problems
> besides this?

I'm not thrilled with that idea, mainly because it would add overhead
to the performance-relevant cases (mark and free) to benefit a rarely
used debugging feature.  I'm content to leave the debug code out of
this for now --- it seems to me that it's serving a different master
and doesn't have to be unified with the other routines.

> For exec_check_rw_parameter():

> I think renaming expr->expr_simple_expr to sexpr saves few bytes but
> doesn't makes anything simpler, so if possible I'd prefer using just
> expr->expr_simple_expr with necessary casts. Furtermore in this
> function mostly we use cast results fexpr, opexpr and sbsref of
> expr->expr_simple_expr that already has separate names.

Hmm, I thought it looked cleaner like this, but I agree beauty
is in the eye of the beholder.  Anybody else have a preference?

> Transferring target param as int paramid looks completely ok. But we
> have unconditional checking Assert(paramid == expr->target_param + 1),
> so it looks as a redundant split as of now. Do we plan a true split
> and removal of this assert in the future?

We've already fetched the target variable using the paramid (cf
plpgsql_param_eval_var_check), so I think checking that the
expression does match it seems like a useful sanity check.
Agreed, it shouldn't ever not match, but that's why that's just
an Assert.

> Thanks for creating and working on this patch!

Thanks for reviewing it!

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 17:57  Andrey Borodin <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Andrey Borodin @ 2025-02-03 17:57 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Pavel Borisov <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]



> On 3 Feb 2025, at 22:36, Tom Lane <[email protected]> wrote:
> 
> I'm not wedded to that name; do you have a better idea?

I'd propose something like attached. But feel free to ignore my suggestion: I do not understand context of these structure members.


Best regards, Andrey Borodin.


Attachments:

  [application/octet-stream] rename.diff (1.8K, 2-rename.diff)
  download | inline diff:
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 191d8fe34d..594cdc2bcb 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -424,8 +424,8 @@ typedef struct ExprEvalStep
 		struct
 		{
 			ExecEvalSubroutine paramfunc;	/* add-on evaluation subroutine */
-			void	   *paramarg;	/* private data for same */
-			void	   *paramarg2;	/* more private data for same */
+			void	   *exprarg;	/* private data for PLpgSQL_expr* */
+			void	   *paramarg;	/* more private data for Param* */
 			int			paramid;	/* numeric ID for parameter */
 			Oid			paramtype;	/* OID of parameter's datatype */
 		}			cparam;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4377ceecb..ddc8e511cc 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -6505,8 +6505,8 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	 * pointers to the PLpgSQL_expr as well as this specific Param, to support
 	 * plpgsql_param_eval_var_check().
 	 */
-	scratch.d.cparam.paramarg = expr;
-	scratch.d.cparam.paramarg2 = param;
+	scratch.d.cparam.exprarg = expr;
+	scratch.d.cparam.paramarg = param;
 	scratch.d.cparam.paramid = param->paramid;
 	scratch.d.cparam.paramtype = param->paramtype;
 	ExprEvalPushStep(state, &scratch);
@@ -6550,8 +6550,8 @@ plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
 	if (!var->isnull &&
 		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
 	{
-		PLpgSQL_expr *expr = (PLpgSQL_expr *) op->d.cparam.paramarg;
-		Param	   *param = (Param *) op->d.cparam.paramarg2;
+		PLpgSQL_expr *expr = (PLpgSQL_expr *) op->d.cparam.exprarg;
+		Param	   *param = (Param *) op->d.cparam.paramarg;
 
 		/*
 		 * We might have already figured this out while evaluating some other


^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 18:18  Tom Lane <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tom Lane @ 2025-02-03 18:18 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Pavel Borisov <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Andrey Borodin <[email protected]> writes:
>> On 3 Feb 2025, at 22:36, Tom Lane <[email protected]> wrote:
>> I'm not wedded to that name; do you have a better idea?

> I'd propose something like attached. But feel free to ignore my suggestion: I do not understand context of these structure members.

Hmm, you're suggesting naming those field members after PL/pgSQL's
specific use of them.  But the intent was that they are generic
workspace for anything that provides a EEOP_PARAM_CALLBACK
callback --- that is, the "param" in the field name refers to the
fact that this is an expression step for some kind of Param, and
not to what PL/pgSQL happens to do with the field.

Admittedly this is all moot unless some other extension starts
using EEOP_PARAM_CALLBACK, and I didn't find any evidence of that
using Debian Code Search.  But I don't want to think of
EEOP_PARAM_CALLBACK as being specifically tied to PL/pgSQL.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-03 18:48  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 34+ messages in thread

From: Tom Lane @ 2025-02-03 18:48 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Pavel Borisov <[email protected]>; Michel Pelletier <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

I wrote:
> Admittedly this is all moot unless some other extension starts
> using EEOP_PARAM_CALLBACK, and I didn't find any evidence of that
> using Debian Code Search.  But I don't want to think of
> EEOP_PARAM_CALLBACK as being specifically tied to PL/pgSQL.

However ... given that I failed to find any outside users today,
I'm warming to the idea of "void *paramarg[2]".  That does look
less random than two separate fields.  There are probably not
any extensions that would need to change their code, and even
if there are, we impose bigger API breaks than this one in
every major release.

So I'm willing to do that if it satisfies your concern.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2025-02-04 11:54  Pavel Borisov <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 34+ messages in thread

From: Pavel Borisov @ 2025-02-04 11:54 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Michel Pelletier <[email protected]>; Andrey Borodin <[email protected]>; Pavel Stehule <[email protected]>; [email protected]

Hi, Tom!

On Mon, 3 Feb 2025 at 21:53, Tom Lane <[email protected]> wrote:
>
> Pavel Borisov <[email protected]> writes:
> > Minor notes on the patches:
>
> > If dump_* functions could use the newly added walker, the code would
> > look better. I suppose the main complication is that dump_* functions
> > contain a lot of per-statement prints/formatting. So maybe a way to
> > implement this is to put these statements into the existing tree
> > walker i.e. plpgsql_statement_tree_walker_impl() and add an argument
> > bool dump_debug into it. So in effect called with dump_debug=false
> > plpgsql_statement_tree_walker_impl() would walk silent, and with
> > dump_debug=false it would walk and print what is supposed to be
> > printed currently in dump_* functions. Maybe there are other problems
> > besides this?
>
> I'm not thrilled with that idea, mainly because it would add overhead
> to the performance-relevant cases (mark and free) to benefit a rarely
> used debugging feature.  I'm content to leave the debug code out of
> this for now --- it seems to me that it's serving a different master
> and doesn't have to be unified with the other routines.
Makes sense.

> > For exec_check_rw_parameter():
>
> > I think renaming expr->expr_simple_expr to sexpr saves few bytes but
> > doesn't makes anything simpler, so if possible I'd prefer using just
> > expr->expr_simple_expr with necessary casts. Furtermore in this
> > function mostly we use cast results fexpr, opexpr and sbsref of
> > expr->expr_simple_expr that already has separate names.
>
> Hmm, I thought it looked cleaner like this, but I agree beauty
> is in the eye of the beholder.  Anybody else have a preference?
>
> > Transferring target param as int paramid looks completely ok. But we
> > have unconditional checking Assert(paramid == expr->target_param + 1),
> > so it looks as a redundant split as of now. Do we plan a true split
> > and removal of this assert in the future?
>
> We've already fetched the target variable using the paramid (cf
> plpgsql_param_eval_var_check), so I think checking that the
> expression does match it seems like a useful sanity check.
> Agreed, it shouldn't ever not match, but that's why that's just
> an Assert.
There are no problems with these.

Regards,
Pavel Borisov






^ permalink  raw  reply  [nested|flat] 34+ messages in thread


end of thread, other threads:[~2025-02-04 11:54 UTC | newest]

Thread overview: 34+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-11-19 19:45 Re: Using Expanded Objects other than Arrays from plpgsql Tom Lane <[email protected]>
2024-11-19 20:52 ` Michel Pelletier <[email protected]>
2024-11-25 02:02   ` Michel Pelletier <[email protected]>
2024-12-04 00:42     ` Tom Lane <[email protected]>
2024-12-07 00:51       ` Michel Pelletier <[email protected]>
2024-12-08 07:05         ` Michel Pelletier <[email protected]>
2024-12-18 20:22           ` Tom Lane <[email protected]>
2024-12-23 03:52             ` Michel Pelletier <[email protected]>
2024-12-23 16:26               ` Tom Lane <[email protected]>
2024-12-25 20:25                 ` Michel Pelletier <[email protected]>
2024-12-27 00:32                   ` Tom Lane <[email protected]>
2024-12-05 20:34     ` Tom Lane <[email protected]>
2025-01-04 16:37 ` Michel Pelletier <[email protected]>
2025-01-04 19:35 ` Tom Lane <[email protected]>
2025-01-04 20:34   ` Michel Pelletier <[email protected]>
2025-01-15 18:09     ` Tom Lane <[email protected]>
2025-01-21 18:05       ` Michel Pelletier <[email protected]>
2025-01-21 18:12         ` Tom Lane <[email protected]>
2025-01-26 10:07           ` Andrey Borodin <[email protected]>
2025-01-26 15:37             ` Tom Lane <[email protected]>
2025-01-26 17:04               ` Andrey Borodin <[email protected]>
2025-01-26 19:04                 ` Tom Lane <[email protected]>
2025-01-30 20:32                   ` Pavel Borisov <[email protected]>
2025-01-31 01:53                     ` Tom Lane <[email protected]>
2025-02-02 21:56                       ` Tom Lane <[email protected]>
2025-02-03 03:02                         ` Michel Pelletier <[email protected]>
2025-02-03 11:42                           ` Pavel Borisov <[email protected]>
2025-02-03 17:53                             ` Tom Lane <[email protected]>
2025-02-04 11:54                               ` Pavel Borisov <[email protected]>
2025-02-03 09:19                         ` Andrey Borodin <[email protected]>
2025-02-03 17:36                           ` Tom Lane <[email protected]>
2025-02-03 17:57                             ` Andrey Borodin <[email protected]>
2025-02-03 18:18                               ` Tom Lane <[email protected]>
2025-02-03 18:48                                 ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox