public inbox for [email protected]  
help / color / mirror / Atom feed
From: Amit Langote <[email protected]>
To: Evan Montgomery-Recht <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
Date: Wed, 8 Apr 2026 10:23:59 +0900
Message-ID: <CA+HiwqFK8rXTNknxV0MMe++7W08g3kY6eeLK21A1xCrK2Wuk8Q@mail.gmail.com> (raw)
In-Reply-To: <CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com>
References: <CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com>
	<CA+HiwqGM6nvAV5O+=Nr+BXMPWOma0oeCRVzVP0XiLE8zX5TVAg@mail.gmail.com>
	<CA+HiwqGMaovCUgDbGxVGnK0Mrivr+ph3YE2Ws+47-ugyPb4f7g@mail.gmail.com>
	<CAFj8pRDaiBe_GOLk_yyYHTtPiDAAaLOM8u1-=Q3ZgXBTH+1igg@mail.gmail.com>
	<CA+HiwqGA5Ay_MR0eJEEbt4j6WrVh4F+AasTp8yCbs5aJLOJn6Q@mail.gmail.com>
	<CAEG8a3JM=NoqiTK0V6S9FNxZPvy1+C5F7rfafTtPKBVWnunL-g@mail.gmail.com>
	<CA+HiwqEyiLCY6MTLbOJXDdLNNQLaURYHvdW797MQgbjEK9od4Q@mail.gmail.com>
	<CAEG8a3+VBpwPf1Rm-ECD90whM9b3YnGhux5CVXdsL6khiBfzRQ@mail.gmail.com>
	<CA+HiwqF2UHzF0sKCp-F2a-U29rqh_9ZPy=f1h+Fh_=M8efj3pg@mail.gmail.com>
	<CAEG8a3L9Ew-WL8sxLROVOcypeaENPmd8qCmMvz4geoGL1TDGCA@mail.gmail.com>
	<CAEG8a3+nUFQo4sdPQF9xy0J73J8RFJ5U9A5+_kMosGDaZ+1sXA@mail.gmail.com>
	<[email protected]>
	<CAEG8a3JyKdizWvYsF+z_mA1BKy=dpW11iKVMOG=bk6Tbz6M1Bw@mail.gmail.com>
	<CAEG8a3+Hf4tvvbts29_k_AFhWQmRYfEo_SW4C5FY_140iKghBw@mail.gmail.com>
	<CA+HiwqFV-PY-3BxM6j5TaAiC3AwedDxo-6vwRSbvygg3zF+xAQ@mail.gmail.com>
	<CA+HiwqHpaisS-e+0gAgzh92qZAFxncAJMmmTRZZN=efoeTPgOg@mail.gmail.com>
	<CA+HiwqFwZ6WLRbY8R7VC7JVp5Jot6ktZOkr9wDxTjoK=W1SgdQ@mail.gmail.com>
	<CA+HiwqF_Kz=R8juHJBiOATvabWSOugK4VaGOxoJ_n=E2c7UM9w@mail.gmail.com>
	<CA+HiwqHCB7kcbspkhaLN9enoj5x=ehzhFM4PXDgWUUP8Px41GA@mail.gmail.com>
	<[email protected]>
	<CA+HiwqHpVtP485wEKuXdOkdoZWhvVvfFH40-og07Jp3MPx21eg@mail.gmail.com>
	<CAEG8a3JWHkJSXe9nNcVK7wnYKUEqWuMGFDhy5BynB_9tEjmEUg@mail.gmail.com>
	<CA+HiwqFjfumKrWy03q5M309xJJVYt0WgGfH6AZ8BjFhSwppwsQ@mail.gmail.com>
	<CAEG8a3JjP1LaKSv-r3AMJLRyLMzENJrKshWsDvDouMPM_sizmA@mail.gmail.com>
	<CA+HiwqFQ+ZA7hSOygv4uv_t75B3r0_gosjadetCsAEoaZwTu6g@mail.gmail.com>
	<CA+HiwqHdB0r8z6Mut13BxpYNq2W-os+Arju4mcZbCyU9PeaVog@mail.gmail.com>
	<CAEG8a3K5ayVNkCDnK9OtNb+4OY0chJtW6ObgEOSFjqyymQda8Q@mail.gmail.com>
	<CA+HiwqGJYCgEs_F-LBtrRdx-Y969LMr-OVogjFXU6U-0q5bOwQ@mail.gmail.com>
	<CA+HiwqF2Ma6R_QyjfPBvFreYbezFpGcwASJE0a2bM+Y0jvof+g@mail.gmail.com>
	<[email protected]>
	<CA+HiwqGFRAH2O5bGpNMErFopFyn-2-Zu3+5y+BFQim9TE8z+Pw@mail.gmail.com>
	<[email protected]>
	<CA+HiwqFx=aciJYkkaviyTutUm303QXz6GtXSqzG7nfd4MAzddQ@mail.gmail.com>
	<CA+HiwqH39QU7vGVx65JH1e0nzVvQc5eVmuY7=qyj0T_+b-HO3A@mail.gmail.com>
	<[email protected]>
	<CA+HiwqF+jAyHUiNzpR+vRBbpeCiVAdFU52-ffTGko9Zit317oA@mail.gmail.com>
	<CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com>

Hi Evan,

On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
<[email protected]> wrote:
>
> Hi Amit,
>
> First time contributing to this project, let me know if I missed
> something or need to adjust what I put together.
>
> I found a crash in the RI fast-path FK check code introduced by
> 2da86c1ef9b and extended by b7b27eb41a5. C-language extensions that
> use SPI to INSERT into tables with multiple FK constraints hit an
> assertion failure (or, without assertions, a server crash) when the
> batch callback fires. I discovered this via PostGIS's topology CI --
> toTopoGeom() uses SPI to insert into edge_data, which has 4 immediate
> FK constraints referencing node and face. PG 18 passes the same test;
> the master crashes.
>
> This appears to be a separate issue from Chao Li's deferred-trigger
> batching bug; the patches touch different files and don't conflict.  I
> did do a regression test on the merge referenced both PostgreSQL and
> PostGIS (to validate that this works.)
>
> The problem: ri_FastPathGetEntry() opens pk_rel/idx_rel and creates
> TupleTableSlots, registering them with the current resource owner --
> the SPI portal's. The batch callback ri_FastPathTeardown() only fires
> when query_depth == 0 (via FireAfterTriggerBatchCallbacks), but by
> that time the inner portal has finished and its resource owner has
> released the relation references and TupleDesc pins. The teardown then
> tries to close relations whose refcounts are already zero:
>
> TRAP: failed Assert("rel->rd_refcnt > 0"), File: "relcache.c"
>
> RelationDecrementReferenceCount
> -> RelationClose -> index_close -> ri_FastPathTeardown
> -> ri_FastPathEndBatch -> FireAfterTriggerBatchCallbacks
> -> AfterTriggerEndQuery -> standard_ExecutorFinish
>
> and TupleDesc pins that are no longer tracked by any resource owner
> cause "tupdesc reference not owned by resource owner" errors.
>
> Note that simple PL/pgSQL functions don't trigger this because
> PL/pgSQL's SPI connection spans the entire function call, so the
> portal's resource owner outlives the batch callback. The crash
> requires nested SPI from a C extension, which creates a shorter-lived
> portal.
>
> The attached patch (against master, for application) fixes this by
> transferring both relation references and TupleDesc pins from the
> current resource owner to CurTransactionResourceOwner immediately
> after creating them in ri_FastPathGetEntry(). The transfer uses
> RelationIncrementReferenceCount / PinTupleDesc under the target owner
> followed by RelationDecrementReferenceCount / ReleaseTupleDesc under
> the original. I chose this over switching CurrentResourceOwner around
> the table_open/index_open calls because the latter also affects
> transient buffer pins acquired during catalog lookups inside those
> functions.
>
> ri_FastPathTeardown is updated to clear any buffered tuples (whose
> buffer pins belong to the current resource owner) before switching to
> CurTransactionResourceOwner for the close/drop operations.
>
> The patch also adds a test module (test_spi_func) with a C function
> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> crash cannot be triggered from PL/pgSQL. The test exercises the
> C-level SPI INSERT with multiple FK constraints, FK violations, and
> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
>
> This is purely a correctness fix with no performance or backward
> compatibility impact. No documentation changes are needed since this
> is an internal bug fix.
>
> The patch compiles cleanly and passes pgindent, clang-tidy, and
> cppcheck. All 247 regression subtests pass, along with the full meson
> test suite (370 ok, 0 fail, 21 skipped) (skipped due to hardware
> availability on my side this week). I also verified the PostGIS
> topology test (toTopoGeom) passes clean with no warnings, and tested
> abort paths (FK violation, transaction rollback, subtransaction abort
> via EXCEPTION blocks) (not in scope of PostgreSQL but more for my own
> verification that things work). Code coverage on the new lines is
> 100%. Tested on macOS (aarch64) and Linux (aarch64, via Docker).

Thanks for the report and the patch.

I'll need to study this one a bit more closely.  Added an open item
for the time being:
https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items

> Unrelated to my patch, SonarCloud flagged a potential issue in
> recheck_matched_pk_tuple() (line 3370): the function loops over
> ii_NumIndexKeyAttrs elements of the skeys array, but the caller in
> ri_FastPathFlushArray passes recheck_skey[1] -- an array of exactly
> one element. This is safe because ri_FastPathFlushArray is the
>
> single-column FK path, so ii_NumIndexKeyAttrs is always 1 there.
> However, the function signature doesn't communicate this constraint,
> which flags as CWE-125 (out-of-bounds read) / CERT C ARR30-C. Adding
> an nkeys parameter (like ri_FastPathProbeOne already has) would make
> the contract explicit.

Makes sense.  Will push the attached patch for this.

-- 
Thanks, Amit Langote


Attachments:

  [application/octet-stream] v1-0001-Add-nkeys-parameter-to-recheck_matched_pk_tuple.patch (3.0K, 2-v1-0001-Add-nkeys-parameter-to-recheck_matched_pk_tuple.patch)
  download | inline diff:
From 344ba694451fbaa57bbb2e12930fe20fd425e806 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 10:20:15 +0900
Subject: [PATCH v1] Add nkeys parameter to recheck_matched_pk_tuple()

The function looped over ii_NumIndexKeyAttrs elements of the skeys
array, but one caller (ri_FastPathFlushArray) passes a one-element
array since it only handles single-column FKs.  The function
signature did not communicate this constraint, which static analysis
flags as a potential out-of-bounds read.

Add an nkeys parameter and assert that it matches
ii_NumIndexKeyAttrs, then use it in the loop.  The call sites
already know the key count.

Reported-by: Evan Montgomery-Recht <[email protected]>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 84f9fecdb4c..18ec858357d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -329,7 +329,7 @@ static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
 static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
 static void ri_CheckPermissions(Relation query_rel);
 static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
-									 TupleTableSlot *new_slot);
+									 int nkeys, TupleTableSlot *new_slot);
 static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
 								 Relation idx_rel, Datum *pk_vals,
 								 char *pk_nulls, ScanKey skeys);
@@ -3138,7 +3138,7 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
 								   idx_rel->rd_indcollation[0],
 								   fpmeta->regops[0],
 								   found_val);
-			if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+			if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, 1, pk_slot))
 				continue;
 		}
 
@@ -3193,7 +3193,7 @@ ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
 						   &concurrently_updated))
 		{
 			if (concurrently_updated)
-				found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+				found = recheck_matched_pk_tuple(idx_rel, skey, nkeys, slot);
 			else
 				found = true;
 		}
@@ -3340,7 +3340,7 @@ ri_CheckPermissions(Relation query_rel)
  * not found.
  */
 static bool
-recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys, int nkeys,
 						 TupleTableSlot *new_slot)
 {
 	/*
@@ -3359,8 +3359,9 @@ recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
 		   indexInfo->ii_ExclusionOps == NULL);
 
 	/* Form the index values and isnull flags given the table tuple. */
+	Assert(nkeys == indexInfo->ii_NumIndexKeyAttrs);
 	FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
-	for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+	for (int i = 0; i < nkeys; i++)
 	{
 		ScanKeyData *skey = &skeys[i];
 
-- 
2.47.3



view thread (63+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
  In-Reply-To: <CA+HiwqFK8rXTNknxV0MMe++7W08g3kY6eeLK21A1xCrK2Wuk8Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox