public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Using Expanded Objects other than Arrays from plpgsql
11+ messages / 3 participants
[nested] [flat]

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-10-23 16:04  Tom Lane <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Tom Lane @ 2024-10-23 16:04 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: [email protected]

I wrote:
> One idea I was toying with is that it doesn't matter if f()
> throws an error so long as the plpgsql function is not executing
> within an exception block: if the error propagates out of the plpgsql
> function then we no longer care about the value of the variable.
> That would very substantially weaken the requirements on how f()
> is implemented.

The more I think about this idea the better I like it.  We can
improve on the original concept a bit: the assignment can be
within an exception block so long as the target variable is too.
For example, consider

	DECLARE x float8[];
	BEGIN
	   ...
	   DECLARE y float8[];
	   BEGIN
	      x := array_append(x, 42);
	      y := array_append(y, 42);
	   END;
	EXCEPTION WHEN ...;
	END;

Currently, both calls of array_append are subject to R/W optimization,
so that array_append must provide a strong guarantee that it won't
throw an error after it's begun to change the R/W object.  If we
redefine things so that the optimization is applied only to "y",
then AFAICS we need nothing from array_append.  It only has to be
sure it doesn't corrupt the object so badly that it can't be freed
... but that requirement exists already, for anything dealing with
expanded objects.  So this would put us in a situation where we
could apply the optimization by default, which'd be a huge win.

There is an exception: if we are considering

	      x := array_cat(x, x);

then I don't think we can optimize because of the aliasing problem
I mentioned before.  So there'd have to be a restriction that the
target variable is mentioned only once in the function's arguments.
For stuff like your vxm() function, that'd be annoying.  But functions
that need that and are willing to deal with the aliasing hazard could
still provide a prosupport function that promises it's okay.  What
we'd accomplish is that a large fraction of interesting functions
could get the benefit without having to create a prosupport function,
which is a win all around.

Also worth noting: in the above example, we could optimize the
update on "x" too, if we know that "x" is not referenced in the
block's EXCEPTION handlers.  I wouldn't bother with this in the
first version, but it might be worth doing later.

So if we go this way, the universe of functions that can benefit
from the optimization enlarges considerably, and the risk of bugs
that break the optimization drops considerably.  The cost is that
some cases that were optimized before now will not be.  But I
suspect that plpgsql functions where this optimization is key
probably don't contain EXCEPTION handlers at all, so that they
won't notice any change.

Thoughts?

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-10-24 02:10  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Michel Pelletier @ 2024-10-24 02:10 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

On Wed, Oct 23, 2024 at 9:04 AM Tom Lane <[email protected]> wrote:

> I wrote:
> > One idea I was toying with is that it doesn't matter if f()
> > throws an error so long as the plpgsql function is not executing
> > within an exception block: if the error propagates out of the plpgsql
> > function then we no longer care about the value of the variable.
> > That would very substantially weaken the requirements on how f()
> > is implemented.
>
> The more I think about this idea the better I like it.  We can
> improve on the original concept a bit: the assignment can be
> within an exception block so long as the target variable is too.
> For example, consider
>
>         DECLARE x float8[];
>         BEGIN
>            ...
>            DECLARE y float8[];
>            BEGIN
>               x := array_append(x, 42);
>               y := array_append(y, 42);
>            END;
>         EXCEPTION WHEN ...;
>         END;
>
> Currently, both calls of array_append are subject to R/W optimization,
> so that array_append must provide a strong guarantee that it won't
> throw an error after it's begun to change the R/W object.  If we
> redefine things so that the optimization is applied only to "y",
> then AFAICS we need nothing from array_append.  It only has to be
> sure it doesn't corrupt the object so badly that it can't be freed
> ... but that requirement exists already, for anything dealing with
> expanded objects.  So this would put us in a situation where we
> could apply the optimization by default, which'd be a huge win.
>

Great!  I can make that same guarantee.


> There is an exception: if we are considering
>
>               x := array_cat(x, x);
>
> then I don't think we can optimize because of the aliasing problem
> I mentioned before.  So there'd have to be a restriction that the
> target variable is mentioned only once in the function's arguments.
> For stuff like your vxm() function, that'd be annoying.  But functions
> that need that and are willing to deal with the aliasing hazard could
> still provide a prosupport function that promises it's okay.  What
> we'd accomplish is that a large fraction of interesting functions
> could get the benefit without having to create a prosupport function,
> which is a win all around.
>

I see, I'm not completely clear on the prosupport code so let me repeat it
back just so I know I'm getting it right, it looks like I'll need to write
a C function, that I specify with CREATE FUNCTION ... SUPPORT, that the
planner will call asking me to tell it that it's ok to alias arguments
(which is fine for SuiteSparse so no problem).  You also mentioned a couple
emails back about a "type support" feature similar to prosupport, that
would allow me to specify an eager expansion function for my types.


> Also worth noting: in the above example, we could optimize the
> update on "x" too, if we know that "x" is not referenced in the
> block's EXCEPTION handlers.  I wouldn't bother with this in the
> first version, but it might be worth doing later.
>
> So if we go this way, the universe of functions that can benefit
> from the optimization enlarges considerably, and the risk of bugs
> that break the optimization drops considerably.  The cost is that
> some cases that were optimized before now will not be.  But I
> suspect that plpgsql functions where this optimization is key
> probably don't contain EXCEPTION handlers at all, so that they
> won't notice any change.
>

Sounds like a good tradeoff to me!  Hopefully if anyone does have concerns
with this approach they'll see this thread and comment.

Thanks,

-Michel

>
>                         regards, tom lane
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-10-24 02:38  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Tom Lane @ 2024-10-24 02:38 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: [email protected]

Michel Pelletier <[email protected]> writes:
> On Wed, Oct 23, 2024 at 9:04 AM Tom Lane <[email protected]> wrote:
>> For stuff like your vxm() function, that'd be annoying.  But functions
>> that need that and are willing to deal with the aliasing hazard could
>> still provide a prosupport function that promises it's okay.  What
>> we'd accomplish is that a large fraction of interesting functions
>> could get the benefit without having to create a prosupport function,
>> which is a win all around.

> I see, I'm not completely clear on the prosupport code so let me repeat it
> back just so I know I'm getting it right, it looks like I'll need to write
> a C function, that I specify with CREATE FUNCTION ... SUPPORT, that the
> planner will call asking me to tell it that it's ok to alias arguments
> (which is fine for SuiteSparse so no problem).  You also mentioned a couple
> emails back about a "type support" feature similar to prosupport, that
> would allow me to specify an eager expansion function for my types.

Right.  You could omit the support function for anything where
aliasing is impossible (e.g., there's only one matrix argument).
Also, very possibly multiple functions could share the same
support function.

>> Also worth noting: in the above example, we could optimize the
>> update on "x" too, if we know that "x" is not referenced in the
>> block's EXCEPTION handlers.  I wouldn't bother with this in the
>> first version, but it might be worth doing later.

Aside: I'm less excited about that than I was earlier.  It's not wrong
for the example I gave, but in more general cases it doesn't work:

         DECLARE x float8[];
         BEGIN
            ...
            BEGIN
               x := array_append(x, 42);
            EXCEPTION WHEN ...;
            END;
         // we might use x here
         END;

So for an "x" in a further-out DECLARE section, we'd have to track not
only whether x is used in the EXCEPTION clause, but whether it's used
anyplace that can be reached after the exception block ends.  That's
starting to sound overly complicated for the benefit.

>> So if we go this way, the universe of functions that can benefit
>> from the optimization enlarges considerably, and the risk of bugs
>> that break the optimization drops considerably.  The cost is that
>> some cases that were optimized before now will not be.

> Sounds like a good tradeoff to me!  Hopefully if anyone does have concerns
> with this approach they'll see this thread and comment.

I realized that we could suppress any complaints of that ilk by
creating prosupport functions for the three built-in functions that
are handled today, allowing them to operate on the same rules as now.
I might not bother with that on purely performance grounds, but it's
likely worth doing simply to provide an in-core test case for the code
path where a prosupport function can be called.  I'm still writing up
details, but right now I'm envisioning completely separate sets of
rules for the prosupport case versus the no-prosupport case.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-10-24 18:32  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Tom Lane @ 2024-10-24 18:32 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: [email protected]

I wrote:
> ... I'm still writing up
> details, but right now I'm envisioning completely separate sets of
> rules for the prosupport case versus the no-prosupport case.

So here is the design I've come up with for optimizing R/W expanded
object updates in plpgsql without any special knowledge from a
prosupport function.  AFAICS this requires no assumptions at all
about the behavior of called functions, other than the bare minimum
"you can't corrupt the object to the point where it wouldn't be
cleanly free-able".  In particular that means it can work for
user-written called functions in plpgsql, SQL, or whatever, not
only for C-coded functions.

There are two requirements to apply the optimization:

* If the assignment statement is within a BEGIN ... EXCEPTION block,
its target variable must be declared inside the most-closely-nested
such block.  This ensures that if an error is thrown from within the
assignment statement's expression, we do not care about the value
of the target variable, except to the extent of being able to clean
it up.

* The target variable must be referenced exactly once within the
RHS expression.  This avoids aliasing hazards such as we discussed
upthread.  But unlike the existing rule, that reference can be
anywhere in the RHS --- it doesn't have to be an argument of the
topmost function.  So for example we can optimize
	foo := fee(fi(fo(fum(foo), other_variable), ...));

While I've not tried to write any code yet, I think both of these
conditions should be reasonably easy to verify.

Given that those conditions are met and the current value of the
assignment target variable is a R/W expanded pointer, we can
execute the assignment as follows:

* Provide the R/W expanded pointer as the value of the Param
representing the sole reference.  At the same time, apply
TransferExpandedObject to reparent the object under the transient
eval_mcontext memory context that's being used to evaluate the
RHS expression, and then set the target variable's value to NULL.
(At this point, the R/W object has exactly the same logical
status as any intermediate calculation result that's palloc'd
in the eval_mcontext.)

* At successful completion of the RHS, assign the result to the
target variable normally.  This includes, if it's an R/W expanded
object, reparenting it under the calling function's main context.

If the RHS expression winds up returning the same expanded object
(probably mutated), then the outer function regains ownership of it
after no more than a couple of TransferExpandedObject calls, which
are cheap.  If the RHS returns something different, then either the
original expanded object got discarded during RHS evaluation, or it
will be cleaned up when we reset the eval_mcontext, so that it won't
be leaked.

I didn't originally foresee the need to transfer the object
into the transient memory context.  But this design avoids any
possibility of double-free attempts, which would be likely to
happen if we allow the outer function's variable to hold onto
a reference to the object while the RHS is evaluated.  A function
receiving an R/W reference is entitled to assume that that value
is not otherwise referenced and can be freed when it's no longer
needed, so it might well get freed during RHS evaluation.  By
converting the original R/W object into (effectively) a temporary
value within the RHS evaluation, we make that assumption valid.

So, while this design greatly expands the set of cases we can
optimize, it does lose some cases that the old approach could
support.  I envision addressing that by allowing a prosupport
function attached to the RHS' topmost function to "bless"
other cases as safe, using reasoning similar to the old rules.
(Or different rules, even, but it's on the prosupport function
to be sure it's safe.)  I don't have a detailed design in mind,
but I'm thinking along the lines of just passing the whole RHS
expression to the prosupport function and letting it decide
what's safe.  In any case, we don't need to even call the
prosupport function unless there's an exception block or
multiple RHS references to the target variable.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-10-31 23:51  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Michel Pelletier @ 2024-10-31 23:51 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

On Thu, Oct 24, 2024 at 11:32 AM Tom Lane <[email protected]> wrote:

> I wrote:
> > ... I'm still writing up
> > details, but right now I'm envisioning completely separate sets of
> > rules for the prosupport case versus the no-prosupport case.
>
> So here is the design I've come up with for optimizing R/W expanded
> object updates in plpgsql without any special knowledge from a
> prosupport function.  AFAICS this requires no assumptions at all
> about the behavior of called functions, other than the bare minimum
> "you can't corrupt the object to the point where it wouldn't be
> cleanly free-able".  In particular that means it can work for
> user-written called functions in plpgsql, SQL, or whatever, not
> only for C-coded functions.
>

Great, I checked with the upstream library authors and they verified that
the object can't be corrupted to where it can't be freed.  Since my
expanded objects are just a box around a library handle, I use a
MemoryContext callback to call the library free function when the context
cleans up, and we can't think of a path where that will fail.


>
> There are two requirements to apply the optimization:
>
> * If the assignment statement is within a BEGIN ... EXCEPTION block,
> its target variable must be declared inside the most-closely-nested
> such block.  This ensures that if an error is thrown from within the
> assignment statement's expression, we do not care about the value
> of the target variable, except to the extent of being able to clean
> it up.
>

My users are writing algebraic expressions to be done in bulk on GPUs,
etc.  I don't think I have to worry too much about wrapping stuff in
exception blocks while handling my library objects.

<snip>

> While I've not tried to write any code yet, I think both of these
> conditions should be reasonably easy to verify.
>
> Given that those conditions are met and the current value of the
> assignment target variable is a R/W expanded pointer, we can
> execute the assignment as follows:
>
> <snip>

> So, while this design greatly expands the set of cases we can
> optimize, it does lose some cases that the old approach could
> support.  I envision addressing that by allowing a prosupport
> function attached to the RHS' topmost function to "bless"
> other cases as safe, using reasoning similar to the old rules.
> (Or different rules, even, but it's on the prosupport function
> to be sure it's safe.)  I don't have a detailed design in mind,
> but I'm thinking along the lines of just passing the whole RHS
> expression to the prosupport function and letting it decide
> what's safe.  In any case, we don't need to even call the
> prosupport function unless there's an exception block or
> multiple RHS references to the target variable.
>

That all sounds great, and it sounds like my prosupport function just needs
to return true, or set some kind of flag saying aliasing is ok.  I'd like
to help as much as possible, but some of that reparenting stuff was pretty
deep for me, other than being a quick sanity check case, is there anything
I can do to help?


>
>                         regards, tom lane
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-01 22:27  Tom Lane <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Tom Lane @ 2024-11-01 22:27 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: [email protected]

Michel Pelletier <[email protected]> writes:
> That all sounds great, and it sounds like my prosupport function just needs
> to return true, or set some kind of flag saying aliasing is ok.  I'd like
> to help as much as possible, but some of that reparenting stuff was pretty
> deep for me, other than being a quick sanity check case, is there anything
> I can do to help?

I wasn't expecting you to write the code, but if you can test it and
see how much it helps your use-case, that would be great.

Here is a v1 patch series that does the first part of what we've been
talking about, which is to implement the new optimization rule for
the case of a single RHS reference to the target variable.  I'm
curious to know if it helps you at all by itself.  You'll definitely
also need commit 534d0ea6c, so probably applying these on our git
master branch would be the place to start.

			regards, tom lane



Attachments:

  [text/x-diff] v1-0001-Preliminary-refactoring.patch (9.5K, 2-v1-0001-Preliminary-refactoring.patch)
  download | inline diff:
From e207d9d377a75181f5bf655ffb81ef53328d3ea3 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Wed, 30 Oct 2024 15:57:35 -0400
Subject: [PATCH v1 1/3] Preliminary refactoring.

This short and boring patch simply moves the responsibility for
initializing PLpgSQL_expr.target_param into plpgsql parsing,
rather than doing it at first execution of the expr as before.
This doesn't save anything in terms of runtime, since the work was
trivial and done only once per expr anyway.  But it makes the info
available during parsing, which will be useful for the next step.

Likewise set PLpgSQL_expr.func during parsing.  According to the
comments, this was once impossible; but it's certainly possible
since we invented the plpgsql_curr_compile variable.  Again, this
saves little runtime, but it seems far cleaner conceptually.

While at it, I reordered stuff in struct PLpgSQL_expr to make it
clearer which fields are filled when, and merged some duplicative
code in pl_gram.y.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_exec.c | 27 ---------------
 src/pl/plpgsql/src/pl_gram.y | 65 ++++++++++++++++++++++++------------
 src/pl/plpgsql/src/plpgsql.h | 31 +++++++++--------
 3 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 86c5bd324a..ce7cdb6458 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4174,12 +4174,6 @@ exec_prepare_plan(PLpgSQL_execstate *estate,
 	SPIPlanPtr	plan;
 	SPIPrepareOptions options;
 
-	/*
-	 * The grammar can't conveniently set expr->func while building the parse
-	 * tree, so make sure it's set before parser hooks need it.
-	 */
-	expr->func = estate->func;
-
 	/*
 	 * Generate and save the plan
 	 */
@@ -5016,21 +5010,7 @@ exec_assign_expr(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 	 * If first time through, create a plan for this expression.
 	 */
 	if (expr->plan == NULL)
-	{
-		/*
-		 * Mark the expression as being an assignment source, if target is a
-		 * simple variable.  (This is a bit messy, but it seems cleaner than
-		 * modifying the API of exec_prepare_plan for the purpose.  We need to
-		 * stash the target dno into the expr anyway, so that it will be
-		 * available if we have to replan.)
-		 */
-		if (target->dtype == PLPGSQL_DTYPE_VAR)
-			expr->target_param = target->dno;
-		else
-			expr->target_param = -1;	/* should be that already */
-
 		exec_prepare_plan(estate, expr, 0);
-	}
 
 	value = exec_eval_expr(estate, expr, &isnull, &valtype, &valtypmod);
 	exec_assign_value(estate, target, value, isnull, valtype, valtypmod);
@@ -6282,13 +6262,6 @@ setup_param_list(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 		 * that they are interrupting an active use of parameters.
 		 */
 		paramLI->parserSetupArg = (void *) expr;
-
-		/*
-		 * Also make sure this is set before parser hooks need it.  There is
-		 * no need to save and restore, since the value is always correct once
-		 * set.  (Should be set already, but let's be sure.)
-		 */
-		expr->func = estate->func;
 	}
 	else
 	{
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 8182ce28aa..5431977d69 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -64,6 +64,10 @@ static	bool			tok_is_keyword(int token, union YYSTYPE *lval,
 static	void			word_is_not_variable(PLword *word, int location);
 static	void			cword_is_not_variable(PLcword *cword, int location);
 static	void			current_token_is_not_variable(int tok);
+static	PLpgSQL_expr	*make_plpgsql_expr(const char *query,
+										   RawParseMode parsemode);
+static	void			expr_is_assignment_source(PLpgSQL_expr *expr,
+												  PLpgSQL_datum *target);
 static	PLpgSQL_expr	*read_sql_construct(int until,
 											int until2,
 											int until3,
@@ -529,6 +533,10 @@ decl_statement	: decl_varname decl_const decl_datatype decl_collate decl_notnull
 									 errmsg("variable \"%s\" must have a default value, since it's declared NOT NULL",
 											var->refname),
 									 parser_errposition(@5)));
+
+						if (var->default_val != NULL)
+							expr_is_assignment_source(var->default_val,
+													  (PLpgSQL_datum *) var);
 					}
 				| decl_varname K_ALIAS K_FOR decl_aliasitem ';'
 					{
@@ -987,6 +995,7 @@ stmt_assign		: T_DATUM
 													   pmode,
 													   false, true,
 													   NULL, NULL);
+						expr_is_assignment_source(new->expr, $1.datum);
 
 						$$ = (PLpgSQL_stmt *) new;
 					}
@@ -2637,6 +2646,38 @@ current_token_is_not_variable(int tok)
 		yyerror("syntax error");
 }
 
+/* Convenience routine to construct a PLpgSQL_expr struct */
+static PLpgSQL_expr *
+make_plpgsql_expr(const char *query,
+				  RawParseMode parsemode)
+{
+	PLpgSQL_expr *expr = palloc0(sizeof(PLpgSQL_expr));
+
+	expr->query = pstrdup(query);
+	expr->parseMode = parsemode;
+	expr->func = plpgsql_curr_compile;
+	expr->ns = plpgsql_ns_top();
+	/* might get changed later during parsing: */
+	expr->target_param = -1;
+	/* other fields are left as zeroes until first execution */
+	return expr;
+}
+
+/* Mark a PLpgSQL_expr as being the source of an assignment to target */
+static void
+expr_is_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
+{
+	/*
+	 * Mark the expression as being an assignment source, if target is a
+	 * simple variable.  We don't currently support optimized assignments to
+	 * other DTYPEs.
+	 */
+	if (target->dtype == PLPGSQL_DTYPE_VAR)
+		expr->target_param = target->dno;
+	else
+		expr->target_param = -1;	/* should be that already */
+}
+
 /* Convenience routine to read an expression with one possible terminator */
 static PLpgSQL_expr *
 read_sql_expression(int until, const char *expected)
@@ -2774,13 +2815,7 @@ read_sql_construct(int until,
 	 */
 	plpgsql_append_source_text(&ds, startlocation, endlocation);
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = parsemode;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, parsemode);
 	pfree(ds.data);
 
 	if (valid_sql)
@@ -3102,13 +3137,7 @@ make_execsql_stmt(int firsttoken, int location, PLword *word)
 	while (ds.len > 0 && scanner_isspace(ds.data[ds.len - 1]))
 		ds.data[--ds.len] = '\0';
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_DEFAULT;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_DEFAULT);
 	pfree(ds.data);
 
 	check_sql_expr(expr->query, expr->parseMode, location);
@@ -3980,13 +4009,7 @@ read_cursor_args(PLpgSQL_var *cursor, int until)
 			appendStringInfoString(&ds, ", ");
 	}
 
-	expr = palloc0(sizeof(PLpgSQL_expr));
-	expr->query = pstrdup(ds.data);
-	expr->parseMode = RAW_PARSE_PLPGSQL_EXPR;
-	expr->plan = NULL;
-	expr->paramnos = NULL;
-	expr->target_param = -1;
-	expr->ns = plpgsql_ns_top();
+	expr = make_plpgsql_expr(ds.data, RAW_PARSE_PLPGSQL_EXPR);
 	pfree(ds.data);
 
 	/* Next we'd better find the until token */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 50c3b28472..fbb6000caa 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -219,14 +219,22 @@ typedef struct PLpgSQL_expr
 {
 	char	   *query;			/* query string, verbatim from function body */
 	RawParseMode parseMode;		/* raw_parser() mode to use */
-	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
-	Bitmapset  *paramnos;		/* all dnos referenced by this query */
+	struct PLpgSQL_function *func;	/* function containing this expr */
+	struct PLpgSQL_nsitem *ns;	/* namespace chain visible to this expr */
 
-	/* function containing this expr (not set until we first parse query) */
-	struct PLpgSQL_function *func;
+	/*
+	 * These fields are used to help optimize assignments to expanded-datum
+	 * variables.  If this expression is the source of an assignment to a
+	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 */
+	int			target_param;	/* dno of assign target, or -1 if none */
 
-	/* namespace chain visible to this expr */
-	struct PLpgSQL_nsitem *ns;
+	/*
+	 * Fields above are set during plpgsql parsing.  Remaining fields are left
+	 * as zeroes/NULLs until we first parse/plan the query.
+	 */
+	SPIPlanPtr	plan;			/* plan, or NULL if not made yet */
+	Bitmapset  *paramnos;		/* all dnos referenced by this query */
 
 	/* fields for "simple expression" fast-path execution: */
 	Expr	   *expr_simple_expr;	/* NULL means not a simple expr */
@@ -235,14 +243,11 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * These fields are used to optimize assignments to expanded-datum
-	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno; else it's -1.
-	 * If we match a Param within expr_simple_expr to such a variable, that
-	 * Param's address is stored in expr_rw_param; then expression code
-	 * generation will allow the value for that Param to be passed read/write.
+	 * If we match a Param within expr_simple_expr to the variable identified
+	 * by target_param, that Param's address is stored in expr_rw_param; then
+	 * expression code generation will allow the value for that Param to be
+	 * passed as a read/write expanded-object pointer.
 	 */
-	int			target_param;	/* dno of assign target, or -1 if none */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
-- 
2.43.5



  [text/x-diff] v1-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch (19.3K, 3-v1-0002-Detect-whether-plpgsql-assignment-targets-are-loc.patch)
  download | inline diff:
From dc1abe9852ba02b294509c3331dd2813901f841b Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Wed, 30 Oct 2024 16:03:06 -0400
Subject: [PATCH v1 2/3] Detect whether plpgsql assignment targets are "local"
 variables.

Mark whether the target of a potentially optimizable assignment
is "local", in the sense of being declared inside any exception
block that could trap an error thrown from the assignment.
(This implies that we needn't preserve the variable's value
in case of an error.)

Normally, this requires a post-parsing scan of the function's
parse tree, since we don't know while parsing a BEGIN ...
construct whether we will find EXCEPTION at its end.  However,
if there are no BEGIN ... EXCEPTION blocks in the function at
all, then all assignments are local, even those to variables
representing function arguments.  We optimize that common case
by initializing the target_is_local flags to "true", and fixing
them up with a post-scan only if we found EXCEPTION.

The scan is implemented by code that's largely copied-and-pasted
from the nearby code to scan a plpgsql parse tree for deletion.
It's a bit annoying to have three copies of that now, but I'm
not seeing a way to refactor it that would save much code on net.

Note that variables' default-value expressions are never interesting
for expanded-variable optimization, since they couldn't contain a
reference to the target variable anyway.  But the code is set up
to compute their target_param and target_is_local correctly anyway,
for consistency and in case someone thinks of a use for that data.

I added a bit of plpgsql_dumptree support to help verify that
this code sets the flags as expected.  I'm not set on keeping
that, but I do want to keep the addition of a plpgsql_dumptree
call in plpgsql_compile_inline.  It's at best an oversight that
"#option dump" doesn't work in a DO block.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/pl/plpgsql/src/pl_comp.c  |  12 +
 src/pl/plpgsql/src/pl_funcs.c | 398 ++++++++++++++++++++++++++++++++++
 src/pl/plpgsql/src/pl_gram.y  |  15 ++
 src/pl/plpgsql/src/plpgsql.h  |   7 +-
 4 files changed, 431 insertions(+), 1 deletion(-)

diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 5633e3c790..c1ca93fe21 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -365,6 +365,7 @@ do_compile(FunctionCallInfo fcinfo,
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	/*
 	 * Initialize the compiler, particularly the namespace stack.  The
@@ -806,6 +807,9 @@ do_compile(FunctionCallInfo fcinfo,
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
 	/* Debug dump for completed functions */
 	if (plpgsql_DumpExecTree)
 		plpgsql_dumptree(function);
@@ -899,6 +903,7 @@ plpgsql_compile_inline(char *proc_source)
 
 	function->nstatements = 0;
 	function->requires_procedure_resowner = false;
+	function->has_exception_block = false;
 
 	plpgsql_ns_init();
 	plpgsql_ns_push(func_name, PLPGSQL_LABEL_BLOCK);
@@ -956,6 +961,13 @@ plpgsql_compile_inline(char *proc_source)
 
 	plpgsql_finish_datums(function);
 
+	if (function->has_exception_block)
+		plpgsql_mark_local_assignment_targets(function);
+
+	/* Debug dump for completed functions */
+	if (plpgsql_DumpExecTree)
+		plpgsql_dumptree(function);
+
 	/*
 	 * Pop the error context stack
 	 */
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index eeb7c4d7c0..889377fc9a 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -333,6 +333,401 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 }
 
 
+/**********************************************************************
+ * Mark assignment source expressions that have local target variables,
+ * that is, variables declared within the exception block most closely
+ * containing the assignment itself.  (Such target variables need not be
+ * preserved if the assignment's source expression raises an error,
+ * allowing better optimization.)
+ *
+ * This code need not be called if the plpgsql function contains no exception
+ * blocks, because expr_is_assignment_source() will have set all the flags
+ * to true already.  Also, we need not examine default-value expressions for
+ * variables, because variable declarations are necessarily within the nearest
+ * exception block.  (In DECLARE ... BEGIN ... EXCEPTION ... END, the variable
+ * initializations are done before entering the exception scope.)  So it's
+ * sufficient to find assignment statements.
+ *
+ * Within the recursion, local_dnos is a Bitmapset of dnos of variables
+ * known to be declared within the current exception level.
+ **********************************************************************/
+static void mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos);
+static void mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos);
+static void mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos);
+static void mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos);
+static void mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos);
+static void mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos);
+static void mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos);
+static void mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos);
+static void mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos);
+static void mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos);
+static void mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos);
+static void mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos);
+static void mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos);
+static void mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos);
+static void mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos);
+static void mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos);
+static void mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos);
+static void mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos);
+static void mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos);
+static void mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos);
+static void mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos);
+static void mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos);
+static void mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos);
+static void mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos);
+static void mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos);
+static void mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos);
+static void mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos);
+static void mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos);
+
+
+static void
+mark_stmt(PLpgSQL_stmt *stmt, Bitmapset *local_dnos)
+{
+	switch (stmt->cmd_type)
+	{
+		case PLPGSQL_STMT_BLOCK:
+			mark_block((PLpgSQL_stmt_block *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSIGN:
+			mark_assign((PLpgSQL_stmt_assign *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_IF:
+			mark_if((PLpgSQL_stmt_if *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CASE:
+			mark_case((PLpgSQL_stmt_case *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_LOOP:
+			mark_loop((PLpgSQL_stmt_loop *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_WHILE:
+			mark_while((PLpgSQL_stmt_while *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORI:
+			mark_fori((PLpgSQL_stmt_fori *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORS:
+			mark_fors((PLpgSQL_stmt_fors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FORC:
+			mark_forc((PLpgSQL_stmt_forc *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FOREACH_A:
+			mark_foreach_a((PLpgSQL_stmt_foreach_a *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXIT:
+			mark_exit((PLpgSQL_stmt_exit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN:
+			mark_return((PLpgSQL_stmt_return *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_NEXT:
+			mark_return_next((PLpgSQL_stmt_return_next *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RETURN_QUERY:
+			mark_return_query((PLpgSQL_stmt_return_query *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_RAISE:
+			mark_raise((PLpgSQL_stmt_raise *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ASSERT:
+			mark_assert((PLpgSQL_stmt_assert *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_EXECSQL:
+			mark_execsql((PLpgSQL_stmt_execsql *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNEXECUTE:
+			mark_dynexecute((PLpgSQL_stmt_dynexecute *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_DYNFORS:
+			mark_dynfors((PLpgSQL_stmt_dynfors *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_GETDIAG:
+			mark_getdiag((PLpgSQL_stmt_getdiag *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_OPEN:
+			mark_open((PLpgSQL_stmt_open *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_FETCH:
+			mark_fetch((PLpgSQL_stmt_fetch *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CLOSE:
+			mark_close((PLpgSQL_stmt_close *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_PERFORM:
+			mark_perform((PLpgSQL_stmt_perform *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_CALL:
+			mark_call((PLpgSQL_stmt_call *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_COMMIT:
+			mark_commit((PLpgSQL_stmt_commit *) stmt, local_dnos);
+			break;
+		case PLPGSQL_STMT_ROLLBACK:
+			mark_rollback((PLpgSQL_stmt_rollback *) stmt, local_dnos);
+			break;
+		default:
+			elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type);
+			break;
+	}
+}
+
+static void
+mark_stmts(List *stmts, Bitmapset *local_dnos)
+{
+	ListCell   *s;
+
+	foreach(s, stmts)
+	{
+		mark_stmt((PLpgSQL_stmt *) lfirst(s), local_dnos);
+	}
+}
+
+static void
+mark_block(PLpgSQL_stmt_block *block, Bitmapset *local_dnos)
+{
+	if (block->exceptions)
+	{
+		ListCell   *e;
+
+		/*
+		 * The block creates a new exception scope, so variables declared at
+		 * outer levels are nonlocal.  For that matter, so are any variables
+		 * declared in the block's DECLARE section.  Hence, we must pass down
+		 * empty local_dnos.
+		 */
+		mark_stmts(block->body, NULL);
+
+		foreach(e, block->exceptions->exc_list)
+		{
+			PLpgSQL_exception *exc = (PLpgSQL_exception *) lfirst(e);
+
+			mark_stmts(exc->action, NULL);
+		}
+	}
+	else
+	{
+		/*
+		 * Otherwise, the block does not create a new exception scope, and any
+		 * variables it declares can also be considered local within it.  Note
+		 * that only initializable datum types (VAR, REC) are included in
+		 * initvarnos; but that's sufficient for our purposes.
+		 */
+		local_dnos = bms_copy(local_dnos);
+		for (int i = 0; i < block->n_initvars; i++)
+			local_dnos = bms_add_member(local_dnos, block->initvarnos[i]);
+		mark_stmts(block->body, local_dnos);
+		bms_free(local_dnos);
+	}
+}
+
+static void
+mark_assign(PLpgSQL_stmt_assign *stmt, Bitmapset *local_dnos)
+{
+	PLpgSQL_expr *expr = stmt->expr;
+
+	/*
+	 * If the assignment target is a plain DTYPE_VAR datum, mark it as local
+	 * or not.  (If it's not a VAR, we don't care.)
+	 */
+	if (expr->target_param >= 0)
+		expr->target_is_local = bms_is_member(expr->target_param, local_dnos);
+}
+
+static void
+mark_if(PLpgSQL_stmt_if *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->then_body, local_dnos);
+	foreach(l, stmt->elsif_list)
+	{
+		PLpgSQL_if_elsif *elif = (PLpgSQL_if_elsif *) lfirst(l);
+
+		/* elif->cond cannot be an assignment source */
+		mark_stmts(elif->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_body, local_dnos);
+}
+
+static void
+mark_case(PLpgSQL_stmt_case *stmt, Bitmapset *local_dnos)
+{
+	ListCell   *l;
+
+	/* stmt->t_expr cannot be an assignment source */
+	foreach(l, stmt->case_when_list)
+	{
+		PLpgSQL_case_when *cwt = (PLpgSQL_case_when *) lfirst(l);
+
+		/* cwt->expr cannot be an assignment source */
+		mark_stmts(cwt->stmts, local_dnos);
+	}
+	mark_stmts(stmt->else_stmts, local_dnos);
+}
+
+static void
+mark_loop(PLpgSQL_stmt_loop *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_while(PLpgSQL_stmt_while *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fori(PLpgSQL_stmt_fori *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->lower, upper, step cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_fors(PLpgSQL_stmt_fors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+}
+
+static void
+mark_forc(PLpgSQL_stmt_forc *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->argquery cannot be an assignment source */
+}
+
+static void
+mark_foreach_a(PLpgSQL_stmt_foreach_a *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+	mark_stmts(stmt->body, local_dnos);
+}
+
+static void
+mark_open(PLpgSQL_stmt_open *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->argquery, query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_fetch(PLpgSQL_stmt_fetch *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_close(PLpgSQL_stmt_close *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_perform(PLpgSQL_stmt_perform *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_call(PLpgSQL_stmt_call *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_commit(PLpgSQL_stmt_commit *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_rollback(PLpgSQL_stmt_rollback *stmt, Bitmapset *local_dnos)
+{
+}
+
+static void
+mark_exit(PLpgSQL_stmt_exit *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond cannot be an assignment source */
+}
+
+static void
+mark_return(PLpgSQL_stmt_return *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_next(PLpgSQL_stmt_return_next *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->expr cannot be an assignment source */
+}
+
+static void
+mark_return_query(PLpgSQL_stmt_return_query *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query, dynquery cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_raise(PLpgSQL_stmt_raise *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->params cannot contain an assignment source */
+	/* stmt->options cannot contain an assignment source */
+}
+
+static void
+mark_assert(PLpgSQL_stmt_assert *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->cond, message cannot be an assignment source */
+}
+
+static void
+mark_execsql(PLpgSQL_stmt_execsql *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->sqlstmt cannot be an assignment source */
+}
+
+static void
+mark_dynexecute(PLpgSQL_stmt_dynexecute *stmt, Bitmapset *local_dnos)
+{
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_dynfors(PLpgSQL_stmt_dynfors *stmt, Bitmapset *local_dnos)
+{
+	mark_stmts(stmt->body, local_dnos);
+	/* stmt->query cannot be an assignment source */
+	/* stmt->params cannot contain an assignment source */
+}
+
+static void
+mark_getdiag(PLpgSQL_stmt_getdiag *stmt, Bitmapset *local_dnos)
+{
+}
+
+void
+plpgsql_mark_local_assignment_targets(PLpgSQL_function *func)
+{
+	Bitmapset  *local_dnos;
+
+	/* Function parameters can be treated as local targets at outer level */
+	local_dnos = NULL;
+	for (int i = 0; i < func->fn_nargs; i++)
+		local_dnos = bms_add_member(local_dnos, func->fn_argvarnos[i]);
+	if (func->action)
+		mark_block(func->action, local_dnos);
+	bms_free(local_dnos);
+}
+
+
 /**********************************************************************
  * Release memory when a PL/pgSQL function is no longer needed
  *
@@ -1594,6 +1989,9 @@ static void
 dump_expr(PLpgSQL_expr *expr)
 {
 	printf("'%s'", expr->query);
+	if (expr->target_param >= 0)
+		printf(" target %d%s", expr->target_param,
+			   expr->target_is_local ? " (local)" : "");
 }
 
 void
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 5431977d69..ddbfda8388 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -2314,6 +2314,8 @@ exception_sect	:
 						PLpgSQL_exception_block *new = palloc(sizeof(PLpgSQL_exception_block));
 						PLpgSQL_variable *var;
 
+						plpgsql_curr_compile->has_exception_block = true;
+
 						var = plpgsql_build_variable("sqlstate", lineno,
 													 plpgsql_build_datatype(TEXTOID,
 																			-1,
@@ -2659,6 +2661,7 @@ make_plpgsql_expr(const char *query,
 	expr->ns = plpgsql_ns_top();
 	/* might get changed later during parsing: */
 	expr->target_param = -1;
+	expr->target_is_local = false;
 	/* other fields are left as zeroes until first execution */
 	return expr;
 }
@@ -2673,9 +2676,21 @@ expr_is_assignment_source(PLpgSQL_expr *expr, PLpgSQL_datum *target)
 	 * other DTYPEs.
 	 */
 	if (target->dtype == PLPGSQL_DTYPE_VAR)
+	{
 		expr->target_param = target->dno;
+
+		/*
+		 * For now, assume the target is local to the nearest enclosing
+		 * exception block.  That's correct if the function contains no
+		 * exception blocks; otherwise we'll update this later.
+		 */
+		expr->target_is_local = true;
+	}
 	else
+	{
 		expr->target_param = -1;	/* should be that already */
+		expr->target_is_local = false; /* ditto */
+	}
 }
 
 /* Convenience routine to read an expression with one possible terminator */
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index fbb6000caa..c6fadc5660 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -225,9 +225,12 @@ typedef struct PLpgSQL_expr
 	/*
 	 * These fields are used to help optimize assignments to expanded-datum
 	 * variables.  If this expression is the source of an assignment to a
-	 * simple variable, target_param holds that variable's dno (else it's -1).
+	 * simple variable, target_param holds that variable's dno (else it's -1),
+	 * and target_is_local indicates whether the target is declared inside the
+	 * closest exception block containing the assignment.
 	 */
 	int			target_param;	/* dno of assign target, or -1 if none */
+	bool		target_is_local;	/* is it within nearest exception block? */
 
 	/*
 	 * Fields above are set during plpgsql parsing.  Remaining fields are left
@@ -1014,6 +1017,7 @@ typedef struct PLpgSQL_function
 	/* data derived while parsing body */
 	unsigned int nstatements;	/* counter for assigning stmtids */
 	bool		requires_procedure_resowner;	/* contains CALL or DO? */
+	bool		has_exception_block;	/* contains BEGIN...EXCEPTION? */
 
 	/* these fields change when the function is used */
 	struct PLpgSQL_execstate *cur_estate;
@@ -1314,6 +1318,7 @@ extern PLpgSQL_nsitem *plpgsql_ns_find_nearest_loop(PLpgSQL_nsitem *ns_cur);
  */
 extern PGDLLEXPORT const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
 extern const char *plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind);
+extern void plpgsql_mark_local_assignment_targets(PLpgSQL_function *func);
 extern void plpgsql_free_function_memory(PLpgSQL_function *func);
 extern void plpgsql_dumptree(PLpgSQL_function *func);
 
-- 
2.43.5



  [text/x-diff] v1-0003-Implement-new-optimization-rule-for-updates-of-ex.patch (26.3K, 4-v1-0003-Implement-new-optimization-rule-for-updates-of-ex.patch)
  download | inline diff:
From 36cb6bb2dd860c2e3e02cdf871acde9efded97e1 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Fri, 1 Nov 2024 16:50:25 -0400
Subject: [PATCH v1 3/3] Implement new optimization rule for updates of
 expanded variables.

If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
---
 src/include/executor/execExpr.h               |   1 +
 src/pl/plpgsql/src/expected/plpgsql_array.out |   9 +
 src/pl/plpgsql/src/pl_exec.c                  | 377 +++++++++++++++---
 src/pl/plpgsql/src/plpgsql.h                  |  22 +-
 src/pl/plpgsql/src/sql/plpgsql_array.sql      |   9 +
 src/tools/pgindent/typedefs.list              |   2 +
 6 files changed, 358 insertions(+), 62 deletions(-)

diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index cd97dfa062..b5fa05fc63 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -405,6 +405,7 @@ typedef struct ExprEvalStep
 		{
 			ExecEvalSubroutine paramfunc;	/* add-on evaluation subroutine */
 			void	   *paramarg;	/* private data for same */
+			void	   *paramarg2;	/* more private data for same */
 			int			paramid;	/* numeric ID for parameter */
 			Oid			paramtype;	/* OID of parameter's datatype */
 		}			cparam;
diff --git a/src/pl/plpgsql/src/expected/plpgsql_array.out b/src/pl/plpgsql/src/expected/plpgsql_array.out
index ad60e0e8be..e5db6d6087 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_array.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_array.out
@@ -52,6 +52,15 @@ NOTICE:  a = ("{""(,11)""}",), a.c1[1].i = 11
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 NOTICE:  a = {1,2,3}
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+NOTICE:  a = {1,2,3,42,3,1,2,3,42,3}
 create temp table onecol as select array[1,2] as f1;
 do $$ declare a int[];
 begin a := f1 from onecol; raise notice 'a = %', a; end$$;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index ce7cdb6458..c29de7a30f 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -251,6 +251,15 @@ static HTAB *shared_cast_hash = NULL;
 	else \
 		Assert(rc == PLPGSQL_RC_OK)
 
+/* State struct for count_param_references */
+typedef struct count_param_references_context
+{
+	int			paramid;
+	int			count;
+	Param	   *last_param;
+} count_param_references_context;
+
+
 /************************************************************
  * Local function forward declarations
  ************************************************************/
@@ -336,7 +345,9 @@ static void exec_prepare_plan(PLpgSQL_execstate *estate,
 static void exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr);
 static bool exec_is_simple_query(PLpgSQL_expr *expr);
 static void exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan);
-static void exec_check_rw_parameter(PLpgSQL_expr *expr);
+static void exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid);
+static bool count_param_references(Node *node,
+								   count_param_references_context *context);
 static void exec_check_assignable(PLpgSQL_execstate *estate, int dno);
 static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
 								  PLpgSQL_expr *expr,
@@ -384,6 +395,10 @@ static ParamExternData *plpgsql_param_fetch(ParamListInfo params,
 static void plpgsql_param_compile(ParamListInfo params, Param *param,
 								  ExprState *state,
 								  Datum *resv, bool *resnull);
+static void plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+										 ExprContext *econtext);
+static void plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+											ExprContext *econtext);
 static void plpgsql_param_eval_var(ExprState *state, ExprEvalStep *op,
 								   ExprContext *econtext);
 static void plpgsql_param_eval_var_ro(ExprState *state, ExprEvalStep *op,
@@ -6078,10 +6093,13 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 
 		/*
 		 * Reset to "not simple" to leave sane state (with no dangling
-		 * pointers) in case we fail while replanning.  expr_simple_plansource
-		 * can be left alone however, as that cannot move.
+		 * pointers) in case we fail while replanning.  We'll need to
+		 * re-determine simplicity and R/W optimizability anyway, since those
+		 * could change with the new plan.  expr_simple_plansource can be left
+		 * alone however, as that cannot move.
 		 */
 		expr->expr_simple_expr = NULL;
+		expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 		expr->expr_rw_param = NULL;
 		expr->expr_simple_plan = NULL;
 		expr->expr_simple_plan_lxid = InvalidLocalTransactionId;
@@ -6439,16 +6457,27 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	scratch.resnull = resnull;
 
 	/*
-	 * Select appropriate eval function.  It seems worth special-casing
-	 * DTYPE_VAR and DTYPE_RECFIELD for performance.  Also, we can determine
-	 * in advance whether MakeExpandedObjectReadOnly() will be required.
-	 * Currently, only VAR/PROMISE and REC datums could contain read/write
-	 * expanded objects.
+	 * Select appropriate eval function.
+	 *
+	 * First, if this Param references the same varlena-type DTYPE_VAR datum
+	 * that is the target of the assignment containing this simple expression,
+	 * then it's possible we will be able to optimize handling of R/W expanded
+	 * datums.  We don't want to do the work needed to determine that unless
+	 * we actually see a R/W expanded datum at runtime, so install a checking
+	 * function that will figure that out when needed.
+	 *
+	 * Otherwise, it seems worth special-casing DTYPE_VAR and DTYPE_RECFIELD
+	 * for performance.  Also, we can determine in advance whether
+	 * MakeExpandedObjectReadOnly() will be required.  Currently, only
+	 * VAR/PROMISE and REC datums could contain read/write expanded objects.
 	 */
 	if (datum->dtype == PLPGSQL_DTYPE_VAR)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		bool		isvarlena = (((PLpgSQL_var *) datum)->datatype->typlen == -1);
+
+		if (isvarlena && dno == expr->target_param && expr->expr_simple_expr)
+			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_check;
+		else if (isvarlena)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_var;
@@ -6457,14 +6486,12 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_recfield;
 	else if (datum->dtype == PLPGSQL_DTYPE_PROMISE)
 	{
-		if (param != expr->expr_rw_param &&
-			((PLpgSQL_var *) datum)->datatype->typlen == -1)
+		if (((PLpgSQL_var *) datum)->datatype->typlen == -1)
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 		else
 			scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
 	}
-	else if (datum->dtype == PLPGSQL_DTYPE_REC &&
-			 param != expr->expr_rw_param)
+	else if (datum->dtype == PLPGSQL_DTYPE_REC)
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic_ro;
 	else
 		scratch.d.cparam.paramfunc = plpgsql_param_eval_generic;
@@ -6473,14 +6500,170 @@ plpgsql_param_compile(ParamListInfo params, Param *param,
 	 * Note: it's tempting to use paramarg to store the estate pointer and
 	 * thereby save an indirection or two in the eval functions.  But that
 	 * doesn't work because the compiled expression might be used with
-	 * different estates for the same PL/pgSQL function.
+	 * different estates for the same PL/pgSQL function.  Instead, store
+	 * pointers to the PLpgSQL_expr as well as this specific Param, to support
+	 * plpgsql_param_eval_var_check().
 	 */
-	scratch.d.cparam.paramarg = NULL;
+	scratch.d.cparam.paramarg = expr;
+	scratch.d.cparam.paramarg2 = param;
 	scratch.d.cparam.paramid = param->paramid;
 	scratch.d.cparam.paramtype = param->paramtype;
 	ExprEvalPushStep(state, &scratch);
 }
 
+/*
+ * plpgsql_param_eval_var_check		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we may need to determine the applicability of a read/write optimization,
+ * but we've not done that yet.
+ */
+static void
+plpgsql_param_eval_var_check(ExprState *state, ExprEvalStep *op,
+							 ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, it's time to
+	 * decide whether/how to optimize the assignment.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		PLpgSQL_expr *expr = (PLpgSQL_expr *) op->d.cparam.paramarg;
+		Param	   *param = (Param *) op->d.cparam.paramarg2;
+
+		/*
+		 * We might have already figured this out while evaluating some other
+		 * Param referencing the same variable.
+		 */
+		if (expr->expr_rwopt == PLPGSQL_RWOPT_UNKNOWN)
+			exec_check_rw_parameter(expr, op->d.cparam.paramid);
+
+		/*
+		 * Update the callback pointer to match what we decided to do, and
+		 * pass off this execution to the selected function.
+		 */
+		switch (expr->expr_rwopt)
+		{
+			case PLPGSQL_RWOPT_UNKNOWN:
+				Assert(false);
+				break;
+			case PLPGSQL_RWOPT_NOPE:
+				/* Force the value to read-only in all future executions */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+				plpgsql_param_eval_var_ro(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_TRANSFER:
+				/* There can be only one matching Param in this case */
+				Assert(param == expr->expr_rw_param);
+				/* When the value is read/write, transfer to exec context */
+				op->d.cparam.paramfunc = plpgsql_param_eval_var_transfer;
+				plpgsql_param_eval_var_transfer(state, op, econtext);
+				break;
+			case PLPGSQL_RWOPT_INPLACE:
+				if (param == expr->expr_rw_param)
+				{
+					/* When the value is read/write, deliver it as-is */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var;
+					plpgsql_param_eval_var(state, op, econtext);
+				}
+				else
+				{
+					/* Not the optimizable reference, so force to read-only */
+					op->d.cparam.paramfunc = plpgsql_param_eval_var_ro;
+					plpgsql_param_eval_var_ro(state, op, econtext);
+				}
+				break;
+		}
+		return;
+	}
+
+	/*
+	 * Otherwise, continue to postpone that decision, and execute an inlined
+	 * version of exec_eval_datum().  Although this value could potentially
+	 * need MakeExpandedObjectReadOnly, we know it doesn't right now.
+	 */
+	*op->resvalue = var->value;
+	*op->resnull = var->isnull;
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
+/*
+ * plpgsql_param_eval_var_transfer		evaluation of EEOP_PARAM_CALLBACK step
+ *
+ * This is specialized to the case of DTYPE_VAR variables for which
+ * we have determined that a read/write expanded value can be handed off
+ * into execution of the expression (and then possibly returned to our
+ * function's ownership afterwards).  We have to test though, because the
+ * variable might not contain a read/write expanded value during this
+ * execution.
+ */
+static void
+plpgsql_param_eval_var_transfer(ExprState *state, ExprEvalStep *op,
+								ExprContext *econtext)
+{
+	ParamListInfo params;
+	PLpgSQL_execstate *estate;
+	int			dno = op->d.cparam.paramid - 1;
+	PLpgSQL_var *var;
+
+	/* fetch back the hook data */
+	params = econtext->ecxt_param_list_info;
+	estate = (PLpgSQL_execstate *) params->paramFetchArg;
+	Assert(dno >= 0 && dno < estate->ndatums);
+
+	/* now we can access the target datum */
+	var = (PLpgSQL_var *) estate->datums[dno];
+	Assert(var->dtype == PLPGSQL_DTYPE_VAR);
+
+	/*
+	 * If the variable's current value is a R/W expanded object, transfer its
+	 * ownership into the expression execution context, then drop our own
+	 * reference to the value by setting the variable to NULL.  That'll be
+	 * overwritten (perhaps with this same object) when control comes back
+	 * from the expression.
+	 */
+	if (!var->isnull &&
+		VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
+	{
+		*op->resvalue = TransferExpandedObject(var->value,
+											   get_eval_mcontext(estate));
+		*op->resnull = false;
+
+		var->value = (Datum) 0;
+		var->isnull = true;
+		var->freeval = false;
+	}
+	else
+	{
+		/*
+		 * Otherwise we can pass the variable's value directly; we now know
+		 * that MakeExpandedObjectReadOnly isn't needed.
+		 */
+		*op->resvalue = var->value;
+		*op->resnull = var->isnull;
+	}
+
+	/* safety check -- an assertion should be sufficient */
+	Assert(var->datatype->typoid == op->d.cparam.paramtype);
+}
+
 /*
  * plpgsql_param_eval_var		evaluation of EEOP_PARAM_CALLBACK step
  *
@@ -7957,9 +8140,10 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 	MemoryContext oldcontext;
 
 	/*
-	 * Initialize to "not simple".
+	 * Initialize to "not simple", and reset R/W optimizability.
 	 */
 	expr->expr_simple_expr = NULL;
+	expr->expr_rwopt = PLPGSQL_RWOPT_UNKNOWN;
 	expr->expr_rw_param = NULL;
 
 	/*
@@ -8164,88 +8348,133 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
 	expr->expr_simple_typmod = exprTypmod((Node *) tle_expr);
 	/* We also want to remember if it is immutable or not */
 	expr->expr_simple_mutable = contain_mutable_functions((Node *) tle_expr);
-
-	/*
-	 * Lastly, check to see if there's a possibility of optimizing a
-	 * read/write parameter.
-	 */
-	exec_check_rw_parameter(expr);
 }
 
 /*
  * exec_check_rw_parameter --- can we pass expanded object as read/write param?
  *
- * If we have an assignment like "x := array_append(x, foo)" in which the
+ * There are two separate cases in which we can optimize an update to a
+ * variable that has a read/write expanded value by letting the called
+ * expression operate directly on the expanded value.  In both cases we
+ * are considering assignments like "var := array_append(var, foo)" where
+ * the assignment target is also an input to the RHS expression.
+ *
+ * Case 1 (RWOPT_TRANSFER rule): if the variable is "local" in the sense that
+ * its declaration is not outside any BEGIN...EXCEPTION block surrounding the
+ * assignment, then we do not need to worry about preserving its value if the
+ * RHS expression throws an error.  If in addition the variable is referenced
+ * exactly once in the RHS expression, then we can optimize by converting the
+ * read/write expanded value into a transient value within the expression
+ * evaluation context, and then setting the variable's recorded value to NULL
+ * to prevent double-free attempts.  This works regardless of any other
+ * details of the RHS expression.  If the expression eventually returns that
+ * same expanded object (possibly modified) then the variable will re-acquire
+ * ownership; while if it returns something else or throws an error, the
+ * expanded object will be discarded as part of cleanup of the evaluation
+ * context.
+ *
+ * Case 2 (RWOPT_INPLACE rule): if we have a non-local assignment or if
+ * it looks like "var := array_append(var, var[1])" with multiple references
+ * to the target variable, then we can't use case 1.  Nonetheless, if the
  * top-level function is trusted not to corrupt its argument in case of an
- * error, then when x has an expanded object as value, it is safe to pass the
- * value as a read/write pointer and let the function modify the value
- * in-place.
+ * error, then when the var has an expanded object as value, it is safe to
+ * pass the value as a read/write pointer to the top-level function and let
+ * the function modify the value in-place.  (Any other references have to be
+ * passed as read-only pointers as usual.)  Only the top-level function has to
+ * be trusted, since if anything further down fails, the object hasn't been
+ * modified yet.
  *
- * This function checks for a safe expression, and sets expr->expr_rw_param
- * to the address of any Param within the expression that can be passed as
- * read/write (there can be only one); or to NULL when there is no safe Param.
+ * This function checks to see if the assignment is optimizable according
+ * to either rule, and updates expr->expr_rwopt accordingly.  In addition,
+ * it sets expr->expr_rw_param to the address of the Param within the
+ * expression that can be passed as read/write (there can be only one);
+ * or to NULL when there is no safe Param.
  *
- * Note that this mechanism intentionally applies the safety labeling to just
- * one Param; the expression could contain other Params referencing the target
- * variable, but those must still be treated as read-only.
+ * Note that this mechanism intentionally allows just one Param to emit a
+ * read/write pointer; in case 2, the expression could contain other Params
+ * referencing the target variable, but those must be treated as read-only.
  *
  * Also note that we only apply this optimization within simple expressions.
  * There's no point in it for non-simple expressions, because the
  * exec_run_select code path will flatten any expanded result anyway.
- * Also, it's safe to assume that an expr_simple_expr tree won't get copied
- * somewhere before it gets compiled, so that looking for pointer equality
- * to expr_rw_param will work for matching the target Param.  That'd be much
- * shakier in the general case.
  */
 static void
-exec_check_rw_parameter(PLpgSQL_expr *expr)
+exec_check_rw_parameter(PLpgSQL_expr *expr, int paramid)
 {
-	int			target_dno;
+	Expr	   *sexpr = expr->expr_simple_expr;
 	Oid			funcid;
 	List	   *fargs;
 	ListCell   *lc;
 
 	/* Assume unsafe */
+	expr->expr_rwopt = PLPGSQL_RWOPT_NOPE;
 	expr->expr_rw_param = NULL;
 
-	/* Done if expression isn't an assignment source */
-	target_dno = expr->target_param;
-	if (target_dno < 0)
-		return;
+	/* Shouldn't be here for non-simple expression */
+	Assert(sexpr != NULL);
+
+	/* Param should match the expression's assignment target, too */
+	Assert(paramid == expr->target_param + 1);
 
 	/*
-	 * If target variable isn't referenced by expression, no need to look
-	 * further.
+	 * If the assignment is to a "local" variable (one whose value won't
+	 * matter anymore if expression evaluation fails), and this Param is the
+	 * only reference to that variable in the expression, then we can
+	 * unconditionally optimize using the "transfer" method.
 	 */
-	if (!bms_is_member(target_dno, expr->paramnos))
-		return;
+	if (expr->target_is_local)
+	{
+		count_param_references_context context;
 
-	/* Shouldn't be here for non-simple expression */
-	Assert(expr->expr_simple_expr != NULL);
+		/* See how many references there are, and find one of them */
+		context.paramid = paramid;
+		context.count = 0;
+		context.last_param = NULL;
+		(void) count_param_references((Node *) sexpr, &context);
+
+		/* If we're here, the expr must contain some reference to the var */
+		Assert(context.count > 0);
+
+		/* If exactly one reference, success! */
+		if (context.count == 1)
+		{
+			expr->expr_rwopt = PLPGSQL_RWOPT_TRANSFER;
+			expr->expr_rw_param = context.last_param;
+			return;
+		}
+	}
 
 	/*
+	 * Otherwise, see if we can trust the expression's top-level function to
+	 * apply the "inplace" method.
+	 *
 	 * Top level of expression must be a simple FuncExpr, OpExpr, or
-	 * SubscriptingRef, else we can't optimize.
+	 * SubscriptingRef, else we can't identify which function is relevant. But
+	 * it's okay to look through any RelabelType above that, since that can't
+	 * fail.
 	 */
-	if (IsA(expr->expr_simple_expr, FuncExpr))
+	if (IsA(sexpr, RelabelType))
+		sexpr = ((RelabelType *) sexpr)->arg;
+	if (IsA(sexpr, FuncExpr))
 	{
-		FuncExpr   *fexpr = (FuncExpr *) expr->expr_simple_expr;
+		FuncExpr   *fexpr = (FuncExpr *) sexpr;
 
 		funcid = fexpr->funcid;
 		fargs = fexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, OpExpr))
+	else if (IsA(sexpr, OpExpr))
 	{
-		OpExpr	   *opexpr = (OpExpr *) expr->expr_simple_expr;
+		OpExpr	   *opexpr = (OpExpr *) sexpr;
 
 		funcid = opexpr->opfuncid;
 		fargs = opexpr->args;
 	}
-	else if (IsA(expr->expr_simple_expr, SubscriptingRef))
+	else if (IsA(sexpr, SubscriptingRef))
 	{
-		SubscriptingRef *sbsref = (SubscriptingRef *) expr->expr_simple_expr;
+		SubscriptingRef *sbsref = (SubscriptingRef *) sexpr;
 
 		/* We only trust standard varlena arrays to be safe */
+		/* TODO: install some extensibility here */
 		if (get_typsubscript(sbsref->refcontainertype, NULL) !=
 			F_ARRAY_SUBSCRIPT_HANDLER)
 			return;
@@ -8256,9 +8485,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) sbsref->refexpr;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8293,9 +8523,10 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 			Param	   *param = (Param *) arg;
 
 			if (param->paramkind == PARAM_EXTERN &&
-				param->paramid == target_dno + 1)
+				param->paramid == paramid)
 			{
 				/* Found the Param we want to pass as read/write */
+				expr->expr_rwopt = PLPGSQL_RWOPT_INPLACE;
 				expr->expr_rw_param = param;
 				return;
 			}
@@ -8303,6 +8534,36 @@ exec_check_rw_parameter(PLpgSQL_expr *expr)
 	}
 }
 
+/*
+ * Count Params referencing the specified paramid, and return one of them
+ * if there are any.
+ *
+ * We actually only need to distinguish 0, 1, and N references; so we can
+ * abort the tree traversal as soon as we've found two.
+ */
+static bool
+count_param_references(Node *node, count_param_references_context *context)
+{
+	if (node == NULL)
+		return false;
+	else if (IsA(node, Param))
+	{
+		Param	   *param = (Param *) node;
+
+		if (param->paramkind == PARAM_EXTERN &&
+			param->paramid == context->paramid)
+		{
+			context->last_param = param;
+			if (++(context->count) > 1)
+				return true;	/* abort tree traversal */
+		}
+		return false;
+	}
+	else
+		return expression_tree_walker(node, count_param_references,
+									  (void *) context);
+}
+
 /*
  * exec_check_assignable --- is it OK to assign to the indicated datum?
  *
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index c6fadc5660..3bafeea28b 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -187,6 +187,17 @@ typedef enum PLpgSQL_resolve_option
 	PLPGSQL_RESOLVE_COLUMN,		/* prefer table column to plpgsql var */
 } PLpgSQL_resolve_option;
 
+/*
+ * Status of optimization of assignment to a read/write expanded object
+ */
+typedef enum PLpgSQL_rwopt
+{
+	PLPGSQL_RWOPT_UNKNOWN = 0,	/* applicability not determined yet */
+	PLPGSQL_RWOPT_NOPE,			/* cannot do any optimization */
+	PLPGSQL_RWOPT_TRANSFER,		/* transfer the old value into expr state */
+	PLPGSQL_RWOPT_INPLACE,		/* pass value as R/W to top-level function */
+} PLpgSQL_rwopt;
+
 
 /**********************************************************************
  * Node and structure definitions
@@ -246,11 +257,14 @@ typedef struct PLpgSQL_expr
 	bool		expr_simple_mutable;	/* true if simple expr is mutable */
 
 	/*
-	 * If we match a Param within expr_simple_expr to the variable identified
-	 * by target_param, that Param's address is stored in expr_rw_param; then
-	 * expression code generation will allow the value for that Param to be
-	 * passed as a read/write expanded-object pointer.
+	 * expr_rwopt tracks whether we have determined that assignment to a
+	 * read/write expanded object (stored in the target_param datum) can be
+	 * optimized by passing it to the expr as a read/write expanded-object
+	 * pointer.  If so, expr_rw_param identifies the specific Param that
+	 * should emit a read/write pointer; any others will emit read-only
+	 * pointers.
 	 */
+	PLpgSQL_rwopt expr_rwopt;	/* can we apply R/W optimization? */
 	Param	   *expr_rw_param;	/* read/write Param within expr, if any */
 
 	/*
diff --git a/src/pl/plpgsql/src/sql/plpgsql_array.sql b/src/pl/plpgsql/src/sql/plpgsql_array.sql
index 4b9ff51594..4a346203dc 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_array.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_array.sql
@@ -48,6 +48,15 @@ begin a.c1[1].i := 11; raise notice 'a = %, a.c1[1].i = %', a, a.c1[1].i; end$$;
 do $$ declare a int[];
 begin a := array_agg(x) from (values(1),(2),(3)) v(x); raise notice 'a = %', a; end$$;
 
+do $$ declare a int[] := array[1,2,3];
+begin
+  -- test scenarios for optimization of updates of R/W expanded objects
+  a := array_append(a, 42);  -- optimizable using "transfer" method
+  a := a || a[3];  -- optimizable using "inplace" method
+  a := a || a;     -- not optimizable
+  raise notice 'a = %', a;
+end$$;
+
 create temp table onecol as select array[1,2] as f1;
 
 do $$ declare a int[];
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2..ea1735386f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1866,6 +1866,7 @@ PLpgSQL_rec
 PLpgSQL_recfield
 PLpgSQL_resolve_option
 PLpgSQL_row
+PLpgSQL_rwopt
 PLpgSQL_stmt
 PLpgSQL_stmt_assert
 PLpgSQL_stmt_assign
@@ -3392,6 +3393,7 @@ core_yy_extra_type
 core_yyscan_t
 corrupt_items
 cost_qual_eval_context
+count_param_references_context
 cp_hash_func
 create_upper_paths_hook_type
 createdb_failure_params
-- 
2.43.5



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-02 00:53  Michel Pelletier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Michel Pelletier @ 2024-11-02 00:53 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

On Fri, Nov 1, 2024 at 3:27 PM Tom Lane <[email protected]> wrote:

> Michel Pelletier <[email protected]> writes:
>
> Here is a v1 patch series that does the first part of what we've been
> talking about, which is to implement the new optimization rule for
> the case of a single RHS reference to the target variable.  I'm
> curious to know if it helps you at all by itself.  You'll definitely
> also need commit 534d0ea6c, so probably applying these on our git
> master branch would be the place to start.
>

I'll apply these tonight and get back to you asap.  There are many
functions in my API that take only one expanded RHS argument, so I'll look
for some cases where your changes reduce expansions when I run my tests.

Thank you!

-Michel


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-11 21:27  Michel Pelletier <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Michel Pelletier @ 2024-11-11 21:27 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

On Fri, Nov 1, 2024 at 5:53 PM Michel Pelletier <[email protected]>
wrote:

> On Fri, Nov 1, 2024 at 3:27 PM Tom Lane <[email protected]> wrote:
>
>> Michel Pelletier <[email protected]> writes:
>>
>> Here is a v1 patch series that does the first part of what we've been
>> talking about, which is to implement the new optimization rule for
>> the case of a single RHS reference to the target variable.  I'm
>> curious to know if it helps you at all by itself.  You'll definitely
>> also need commit 534d0ea6c, so probably applying these on our git
>> master branch would be the place to start.
>>
>
> I'll apply these tonight and get back to you asap.  There are many
> functions in my API that take only one expanded RHS argument, so I'll look
> for some cases where your changes reduce expansions when I run my tests.
>

I tested these patches with my test setup and can confirm there is now one
less expansion in this function:

create or replace function test_expand(graph matrix) returns matrix
language plpgsql as
    $$
    declare
        nvals bigint = nvals(graph);
    begin
        return graph;
    end;
    $$;

postgres=# select test_expand(a) from test_fixture ;
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free
DEBUG:  matrix_out
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free

The second expansion in matrix_out happens outside the function, so inside
there is only the one expansion for both matrix_nvals and the assignment.
Thank you!  All my tests continue to pass and the change seems to work well.

Looking forward to helping test the support function idea, let me know if
there's anything else I can do to validate the idea.

-Michel

>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-11 21:34  Michel Pelletier <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Michel Pelletier @ 2024-11-11 21:34 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

>
> The second expansion in matrix_out happens outside the function, so inside
> there is only the one expansion for both matrix_nvals and the assignment.
> Thank you!  All my tests continue to pass and the change seems to work well.


Hmm, well I spoke to soon looking at the wrong function without an
assignment, I've been down in gdb a bit too much and I got mixed up, here's
another function that wraps it and does an assignment, and it looks like
there is another expansion happening after the nvals but before the
assignment:

create or replace function test_expand_expand(graph matrix) returns matrix
language plpgsql as
    $$
    declare
        nvals bigint = nvals(graph);
    begin
        graph = test_expand(graph);
        return test_expand(graph);
    end;
    $$;

postgres=# select test_expand_expand(a) from test_fixture ;
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free
DEBUG:  matrix_out
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free

So going back on the assumption I'm somehow not returning the right
pointer, but digging into the array code I'm pretty sure I'm following the
same pattern and tracing my code path is calling EOHPGetRWDatum when I
return the object, I can't get it to stop on DeleteExpandedObject, so I
don't think plpgsql is deleting it, I'm going to keep investigating, sorry
for the noise.

-Michel

>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-19 17:50  Michel Pelletier <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Michel Pelletier @ 2024-11-19 17:50 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

>
>
> So going back on the assumption I'm somehow not returning the right
> pointer, but digging into the array code I'm pretty sure I'm following the
> same pattern and tracing my code path is calling EOHPGetRWDatum when I
> return the object, I can't get it to stop on DeleteExpandedObject, so I
> don't think plpgsql is deleting it, I'm going to keep investigating, sorry
> for the noise.
>

A couple years ago I tried to compress what I learned about expanded
objects into a dummy extension that just provides the necessary
boilerplate.  It wasn't great but a start:

https://github.com/michelp/pgexpanded

Pavel Stehule indicated this might be a good example to put into contrib:

https://www.postgresql.org/message-id/[email protected]...

I've updated the repo to include points from our current discussion wrt
multiple expansions in a plpgsql function, and added some test functions.
The new pgexpanded type just keeps track of the number of times it's been
expanded starting from an initialized value.  While it only stores a
flattened integer, it follows the typical pattern of pallocing the value
and pfreeing it with a memory context callback, so it should be covering
most of the boilerplate needed to start a new expanded type.

This would also be a good place to showcase and test the support function
feature you've described to me.

It occurs to me that including this example in contrib would be an easier
way for us to collaborate on my existing issue instead of punting back and
forth on the list and would benefit all future expanded object developers.
Do you think that's a good idea?  If this were in contrib you could access
the code I'm working with directly and I can just follow along in my
project.

-Michel


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Using Expanded Objects other than Arrays from plpgsql
@ 2024-11-19 18:44  Pavel Stehule <[email protected]>
  parent: Michel Pelletier <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Pavel Stehule @ 2024-11-19 18:44 UTC (permalink / raw)
  To: Michel Pelletier <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

út 19. 11. 2024 v 18:51 odesílatel Michel Pelletier <
[email protected]> napsal:

>
>> So going back on the assumption I'm somehow not returning the right
>> pointer, but digging into the array code I'm pretty sure I'm following the
>> same pattern and tracing my code path is calling EOHPGetRWDatum when I
>> return the object, I can't get it to stop on DeleteExpandedObject, so I
>> don't think plpgsql is deleting it, I'm going to keep investigating, sorry
>> for the noise.
>>
>
> A couple years ago I tried to compress what I learned about expanded
> objects into a dummy extension that just provides the necessary
> boilerplate.  It wasn't great but a start:
>
> https://github.com/michelp/pgexpanded
>
> Pavel Stehule indicated this might be a good example to put into contrib:
>
>
> https://www.postgresql.org/message-id/[email protected]...
>
> I've updated the repo to include points from our current discussion wrt
> multiple expansions in a plpgsql function, and added some test functions.
> The new pgexpanded type just keeps track of the number of times it's been
> expanded starting from an initialized value.  While it only stores a
> flattened integer, it follows the typical pattern of pallocing the value
> and pfreeing it with a memory context callback, so it should be covering
> most of the boilerplate needed to start a new expanded type.
>
> This would also be a good place to showcase and test the support function
> feature you've described to me.
>
> It occurs to me that including this example in contrib would be an easier
> way for us to collaborate on my existing issue instead of punting back and
> forth on the list and would benefit all future expanded object developers.
> Do you think that's a good idea?  If this were in contrib you could access
> the code I'm working with directly and I can just follow along in my
> project.
>

another position can be src/test/modules - I think so your example is
"similar" to plsample

Regards

Pavel


>
> -Michel
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread


end of thread, other threads:[~2024-11-19 18:44 UTC | newest]

Thread overview: 11+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-10-23 16:04 Re: Using Expanded Objects other than Arrays from plpgsql Tom Lane <[email protected]>
2024-10-24 02:10 ` Michel Pelletier <[email protected]>
2024-10-24 02:38   ` Tom Lane <[email protected]>
2024-10-24 18:32     ` Tom Lane <[email protected]>
2024-10-31 23:51       ` Michel Pelletier <[email protected]>
2024-11-01 22:27         ` Tom Lane <[email protected]>
2024-11-02 00:53           ` Michel Pelletier <[email protected]>
2024-11-11 21:27             ` Michel Pelletier <[email protected]>
2024-11-11 21:34               ` Michel Pelletier <[email protected]>
2024-11-19 17:50                 ` Michel Pelletier <[email protected]>
2024-11-19 18:44                   ` Pavel Stehule <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox