Re: PG19 FK fast path: OOB write and missed FK checks during batched

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Amit Langote <[email protected]>
To: Ayush Tiwari <[email protected]>
Cc: Nikolay Samokhvalov <[email protected]>
Cc: pgsql-hackers mailing list <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: Kirk Wolak <[email protected]>
Subject: Re: PG19 FK fast path: OOB write and missed FK checks during batched
Date: Wed, 10 Jun 2026 21:17:07 +0900
Message-ID: <CA+HiwqELE-eyOfBBEmpr_eGf-04PUvZg5BjypW2CMHbed5QGhA@mail.gmail.com> (raw)
In-Reply-To: <CAJTYsWVgHCNDZb2F62F+aELnKJO2BWtHaAXcN-AmgPPP+GAnUQ@mail.gmail.com>
References: <CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com>
	<CA+HiwqGTOwRqkgrhqq6-nLyVGfGuAHMfoo+Ob2A4Z98ZkgwCmg@mail.gmail.com>
	<CA+HiwqGWUtxc7KECuT06aYTiwwxGBxM89qY_W64dQjYEoziXog@mail.gmail.com>
	<CA+HiwqHUz50YqJn4XiNsSLN2c+9eYBy1af=y_dfdJTsz5BmbJg@mail.gmail.com>
	<CAM527d_2OpJ3KCOT1QqGh4neCPpgZTgM+VUxTqVgOSweOzTDQw@mail.gmail.com>
	<CA+HiwqFBXTTy3KcfGVxqxkhX5zV99R7=s2EwkxMiiWnVbyTpyw@mail.gmail.com>
	<CAJTYsWVgHCNDZb2F62F+aELnKJO2BWtHaAXcN-AmgPPP+GAnUQ@mail.gmail.com>

Hi Ayush,

Thanks for the review.

On Wed, Jun 10, 2026 at 7:09 PM Ayush Tiwari
<[email protected]> wrote:
> On Wed, 10 Jun 2026 at 14:02, Amit Langote <[email protected]> wrote
>> Thanks for checking.  I will review them a bit more closely before
>> committing by Friday.  Other reviews are welcome.
>
> Thanks for the patch!
>
> I read through v1-0001 and v1-0002 and tried them locally. I had a couple of
> things I wanted to ask about.
>
> 1. The per-entry "flushing" flag and test coverage.  If I'm reading the two
> patches together correctly, with both applied the 64-row re-entry test in 0001
> reaches the flush through ri_FastPathEndBatch(), where 0002's cache-wide
> ri_fastpath_flushing guard already routes the re-entrant check to the per-row
> path before it gets back into ri_FastPathBatchAdd().  Does that mean the
> per-entry flag from 0001 isn't really exercised by that test once 0002 is in?
> As far as I can tell you'd need the flush to fire from ri_FastPathBatchAdd()
> itself (a 65th row) to reach it.  I tried a 65-row variant (same FK, re-entrant
> DML from the cast during the full-batch flush), including a case where the
> re-entrant row was an orphan, and it seemed to do the right thing; the
> per-row fallback still raised the violation.  Would it be worth switching the
> test to 65 rows, or adding that variant, so the per-entry guard is covered too?
> Or am I missing a path where the committed test already hits it?

You're right. With 0002 applied, the 64-row test reaches the flush
through ri_FastPathEndBatch(), where the cache-wide
ri_fastpath_flushing guard catches the re-entry before it returns to
ri_FastPathBatchAdd(), so the per-entry flag is no longer exercised by
that test. To hit the per-entry flag the flush has to fire from
ri_FastPathBatchAdd() itself, which the 64-row case no longer does
once the add and flush are reordered.

Rather than bump the test to 65 rows, I'd prefer to keep the flush
firing from ri_FastPathBatchAdd() at 64 by not reordering the add and
flush, and prevent the OOB write by bounds-checking the write instead,
as done in the attached updated 0001. A re-entrant add then can't
overrun the array regardless of the flag, the per-entry flushing guard
still routes the re-entry to the per-row path, and a 64-row statement
flushes from ri_FastPathBatchAdd() on the 64th row, so the existing
test exercises the per-entry guard.

> 2. Resetting ri_fastpath_flushing.  I noticed it's cleared only in the
> PG_FINALLY of ri_FastPathEndBatch(), which does seem to cover the cases I could
> think of.  Since ri_FastPathXactCallback already NULLs ri_fastpath_cache and
> clears ri_fastpath_callback_registered at transaction end, I wondered whether
> it might be worth clearing ri_fastpath_flushing there too, just as cheap
> insurance against some future path that leaves it set across transactions
> though maybe that's unnecessary given the PG_FINALLY.

Agreed, it's cheap and matches the existing resets there, so I've
added it to ri_FastPathXactCallback() in v2-0002.

> Other than the above queries, the patch looks good to me.

Updated patches attached.

-- 
Thanks, Amit Langote


Attachments:

  [application/octet-stream] v2-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch (10.5K, 2-v2-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch)
  download | inline diff:
From f2979edb939d37e81d8144d18328aafcceb501c5 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 10 Jun 2026 21:07:09 +0900
Subject: [PATCH v2 1/2] Fix out-of-bounds write in RI fast-path batch on
 re-entry

The FK fast-path batching added in b7b27eb41a5 wrote the incoming row
into the batch array before checking whether the array was full:

    fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
    fpentry->batch_count++;
    if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
        ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);

batch_count is reset to zero only at the end of ri_FastPathBatchFlush(),
so it remains at RI_FASTPATH_BATCH_SIZE throughout a full-batch flush.
A flush runs user-defined cast functions and equality operators; if that
user code performs DML on the same FK table, ri_FastPathBatchAdd()
re-enters with batch_count == RI_FASTPATH_BATCH_SIZE and writes one past
the end of the array, corrupting the adjacent batch_count field.  This
is reachable by an unprivileged table owner via an implicit cast with a
PL/pgSQL function and causes a SIGSEGV in assert-enabled builds.

Fix by bounds-checking the write into the batch array so a re-entrant
add can never write past the end, and by adding a "flushing" flag to
RI_FastPathEntry that routes re-entrant ri_FastPathBatchAdd() calls on
a busy entry to the per-row path (ri_FastPathCheck) instead of touching
the mid-flush batch array.  The flag is set around the probe in
ri_FastPathBatchFlush() and cleared in a PG_FINALLY, which also resets
batch_count, so the entry is left empty and reusable if a flush error
(including a reported FK violation) is caught by a savepoint.

Add regression tests for both the re-entrant flush and reuse of an entry
after a flush error caught by a savepoint.

Reported-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Ayush Tiwari <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 70 +++++++++++++++++------
 src/test/regress/expected/foreign_key.out | 56 ++++++++++++++++++
 src/test/regress/sql/foreign_key.sql      | 46 +++++++++++++++
 3 files changed, 155 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index dc89c686394..6d0d4204886 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -249,6 +249,12 @@ typedef struct RI_FastPathEntry
 	 */
 	HeapTuple	batch[RI_FASTPATH_BATCH_SIZE];
 	int			batch_count;
+
+	/*
+	 * true while this entry's batch is being flushed; guards against
+	 * re-entrant ri_FastPathBatchAdd from user code run during the flush.
+	 */
+	bool		flushing;
 } RI_FastPathEntry;
 
 /*
@@ -2860,15 +2866,31 @@ ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
 					Relation fk_rel, TupleTableSlot *newslot)
 {
 	RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
-	MemoryContext oldcxt;
 
-	oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
-	fpentry->batch[fpentry->batch_count] =
-		ExecCopySlotHeapTuple(newslot);
-	fpentry->batch_count++;
-	MemoryContextSwitchTo(oldcxt);
+	/*
+	 * If this entry is already being flushed, a cast function or an operator
+	 * invoked during the flush has re-entered with DML on the same FK.  Fall
+	 * back to the per-row path rather than touching the batch array, which is
+	 * mid-flush.
+	 */
+	if (fpentry->flushing)
+	{
+		ri_FastPathCheck(riinfo, fk_rel, newslot);
+		return;
+	}
+
+	/* Buffer the row if there is room; otherwise it is flushed below. */
+	if (fpentry->batch_count < RI_FASTPATH_BATCH_SIZE)
+	{
+		MemoryContext oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+		fpentry->batch[fpentry->batch_count] =
+			ExecCopySlotHeapTuple(newslot);
+		fpentry->batch_count++;
+		MemoryContextSwitchTo(oldcxt);
+	}
 
-	if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+	/* Flush as soon as the batch is full. */
+	if (fpentry->batch_count == RI_FASTPATH_BATCH_SIZE)
 		ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
 }
 
@@ -2944,13 +2966,30 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 	}
 	Assert(riinfo->fpmeta);
 
-	/* Skip array overhead for single-row batches. */
-	if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
-		violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
-												fk_rel, snapshot, scandesc);
-	else
-		violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
-											   fk_rel, snapshot, scandesc);
+	/*
+	 * The probe runs user-defined cast and equality functions.  Set the
+	 * flushing flag around it so a re-entrant ri_FastPathBatchAdd on this
+	 * entry takes the per-row path, and clear it even on error so the entry
+	 * is reusable if the error is caught by a savepoint.
+	 */
+	Assert(!fpentry->flushing);
+	fpentry->flushing = true;
+	PG_TRY();
+	{
+		/* Skip array overhead for single-row batches. */
+		if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
+			violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+													fk_rel, snapshot, scandesc);
+		else
+			violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+												   fk_rel, snapshot, scandesc);
+	}
+	PG_FINALLY();
+	{
+		fpentry->flushing = false;
+		fpentry->batch_count = 0;
+	}
+	PG_END_TRY();
 
 	SetUserIdAndSecContext(saved_userid, saved_sec_context);
 	UnregisterSnapshot(snapshot);
@@ -2966,9 +3005,6 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 
 	MemoryContextReset(fpentry->flush_cxt);
 	MemoryContextSwitchTo(oldcxt);
-
-	/* Reset. */
-	fpentry->batch_count = 0;
 }
 
 /*
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 8b3b268de0f..e08dff99f03 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3712,3 +3712,59 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+  a  | count 
+-----+-------
+ (1) |    64
+ (2) |    64
+(2 rows)
+
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+    -- Reuse the same FK with a full batch in the same transaction.  The
+    -- entry must be empty after the caught violation: no stale rows from the
+    -- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+NOTICE:  caught fk violation
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+ count | max 
+-------+-----
+    64 |   1
+(1 row)
+
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 7eb86b188f0..87381194f41 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2680,3 +2680,49 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+    -- Reuse the same FK with a full batch in the same transaction.  The
+    -- entry must be empty after the caught violation: no stale rows from the
+    -- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
-- 
2.47.3



  [application/octet-stream] v2-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch (11.3K, 3-v2-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch)
  download | inline diff:
From dd87a2e45f2f2dbdccd4d65cbd90256b7f72e277 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 10 Jun 2026 21:11:10 +0900
Subject: [PATCH v2 2/2] Confine RI fast-path batching to the top transaction
 level

The FK fast-path batching added in b7b27eb41a5 buffers rows in a
transaction-lived cache (ri_fastpath_cache) keyed by constraint OID.
Running user-defined cast and equality functions during a batch flush,
together with the cache's lifetime and iteration, exposed two defects
reachable by an unprivileged table owner.

First, on subtransaction abort ri_FastPathSubXactCallback discarded the
entire cache.  An entry's batch holds rows buffered by the enclosing
transaction, not just the aborting subxact -- the cache is keyed by
constraint, so a single entry can mix rows from multiple subxact levels.
An internal subxact abort during after-trigger firing (e.g. a PL/pgSQL
BEGIN ... EXCEPTION block) therefore dropped buffered rows of the outer
transaction without running their FK checks, letting orphan rows commit
behind a constraint that still reported itself valid.  The discard also
left relations opened by the batch unclosed, producing "resource was not
closed" warnings.

Second, ri_FastPathEndBatch flushes by iterating the cache with
hash_seq_search.  If flush-time user code inserts into a different
fast-path FK table, a new entry is added to the cache mid-scan; it may
land in a bucket the scan has already passed and never be reached, and
ri_FastPathTeardown then destroys the cache without flushing it,
silently dropping that check.

Cleanly unwinding the cache on subxact abort would require tracking the
originating subxact of each buffered row, since rows from different
levels share an entry (the cache is keyed by constraint) and deferred
constraints cannot be flushed early at a subxact boundary.  Rather than
add that bookkeeping, confine batching to the top transaction level: in
RI_FKey_check, when GetCurrentTransactionNestLevel() > 1, use the
per-row fast path (ri_FastPathCheck) instead of buffering.  Rows checked
inside a subtransaction are then verified immediately and roll back
cleanly with their subtransaction, and the cache only ever holds
top-level rows.  With the cache confined to the top level, a
subtransaction abort has nothing of its own to discard, so
ri_FastPathSubXactCallback is removed along with its registration.

For the second defect, add a cache-wide flag (ri_fastpath_flushing) set
while ri_FastPathEndBatch iterates the cache.  A re-entrant FK check
arriving while the flag is set takes the per-row path rather than adding
an entry to the cache being scanned, so no entry can be missed and torn
down unflushed.  The flag is cleared in a PG_FINALLY so a flush that
throws (a reported violation or an error from user code) does not leave
it stuck.  As defensive insurance it is also cleared in
ri_FastPathXactCallback() at transaction end.

The per-row fast path still bypasses SPI and stays well ahead of the
pre-19 SPI-based check.  A fuller fix that preserves batching across
subtransactions -- whether by tracking the originating subxact of each
buffered row or by per-subxact cache stacks merged into the parent on
commit -- is left for a future release.

The subtransaction-abort case is covered by a new regression test.  The
mid-scan cross-table case depends on hash bucket placement and so is not
reliably reproducible in a portable test, but the flag prevents it by
construction.

Reported-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Ayush Tiwari <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 82 ++++++++++++++++-------
 src/test/regress/expected/foreign_key.out | 24 +++++++
 src/test/regress/sql/foreign_key.sql      | 23 +++++++
 3 files changed, 103 insertions(+), 26 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6d0d4204886..6ad521fba34 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -267,6 +267,7 @@ static dclist_head ri_constraint_cache_valid_list;
 
 static HTAB *ri_fastpath_cache = NULL;
 static bool ri_fastpath_callback_registered = false;
+static bool ri_fastpath_flushing = false;
 
 /*
  * Local function prototypes
@@ -469,14 +470,31 @@ RI_FKey_check(TriggerData *trigdata)
 	 */
 	if (ri_fastpath_is_applicable(riinfo))
 	{
-		if (AfterTriggerIsActive())
+		if (AfterTriggerIsActive() &&
+			GetCurrentTransactionNestLevel() == 1 &&
+			!ri_fastpath_flushing)
 		{
 			/* Batched path: buffer and probe in groups */
 			ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
 		}
 		else
 		{
-			/* ALTER TABLE validation: per-row, no cache */
+			/*
+			 * Per-row path, used when batching is not safe or not
+			 * applicable:
+			 *
+			 * - ALTER TABLE validation, where no after-trigger firing is
+			 *   active;
+			 *
+			 * - any FK check inside a subtransaction, since the batch cache
+			 *   is confined to the top transaction level (it cannot be
+			 *   cleanly unwound on subxact abort);
+			 *
+			 * - a re-entrant check from user cast/operator code running
+			 *   during a batch flush, since adding a cache entry while
+			 *   ri_FastPathEndBatch is iterating the cache could leave it
+			 *   unflushed.
+			 */
 			ri_FastPathCheck(riinfo, fk_rel, newslot);
 		}
 		return PointerGetDatum(NULL);
@@ -4174,19 +4192,41 @@ ri_FastPathEndBatch(void *arg)
 	if (ri_fastpath_cache == NULL)
 		return;
 
-	/* Flush any partial batches -- can throw ERROR */
-	hash_seq_init(&status, ri_fastpath_cache);
-	while ((entry = hash_seq_search(&status)) != NULL)
+	/*
+	 * Set a flag for the duration of the scan so that any FK check triggered
+	 * by user cast or operator code during a flush takes the per-row path
+	 * instead of adding a new entry to the cache we are iterating.  A new
+	 * entry could land in an already-scanned bucket and then be torn down
+	 * unflushed below.
+	 *
+	 * The flush can throw ERROR (a reported constraint violation, or an error
+	 * from the user code it runs).  In that case ri_FastPathTeardown below is
+	 * skipped; the ResourceOwner and the transaction-end callback handle
+	 * resource cleanup on the abort path.  The PG_FINALLY only resets the
+	 * flag and deliberately does not attempt teardown.
+	 */
+	Assert(!ri_fastpath_flushing);
+	ri_fastpath_flushing = true;
+	PG_TRY();
 	{
-		if (entry->batch_count > 0)
+		hash_seq_init(&status, ri_fastpath_cache);
+		while ((entry = hash_seq_search(&status)) != NULL)
 		{
-			Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
-			RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+			if (entry->batch_count > 0)
+			{
+				Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
+				RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
 
-			ri_FastPathBatchFlush(entry, fk_rel, riinfo);
-			table_close(fk_rel, NoLock);
+				ri_FastPathBatchFlush(entry, fk_rel, riinfo);
+				table_close(fk_rel, NoLock);
+			}
 		}
 	}
+	PG_FINALLY();
+	{
+		ri_fastpath_flushing = false;
+	}
+	PG_END_TRY();
 
 	ri_FastPathTeardown();
 }
@@ -4238,22 +4278,13 @@ ri_FastPathXactCallback(XactEvent event, void *arg)
 	 */
 	ri_fastpath_cache = NULL;
 	ri_fastpath_callback_registered = false;
-}
 
-static void
-ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
-						   SubTransactionId parentSubid, void *arg)
-{
-	if (event == SUBXACT_EVENT_ABORT_SUB)
-	{
-		/*
-		 * ResourceOwner already released relations.  NULL the static pointers
-		 * so the still-registered batch callback becomes a no-op for the rest
-		 * of this transaction.
-		 */
-		ri_fastpath_cache = NULL;
-		ri_fastpath_callback_registered = false;
-	}
+	/*
+	 * Also clear the in-flush flag.  ri_FastPathEndBatch() already clears it
+	 * via PG_FINALLY, so this is just defensive: it keeps a stale flag from
+	 * surviving into the next transaction should any future path leave it set.
+	 */
+	ri_fastpath_flushing = false;
 }
 
 /*
@@ -4280,7 +4311,6 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
 		if (!ri_fastpath_xact_callback_registered)
 		{
 			RegisterXactCallback(ri_FastPathXactCallback, NULL);
-			RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
 			ri_fastpath_xact_callback_registered = true;
 		}
 
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index e08dff99f03..e1563144d4c 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3768,3 +3768,27 @@ SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 (1 row)
 
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+ERROR:  insert or update on table "fp_subxact_fk" violates foreign key constraint "fp_subxact_fk_fkey"
+DETAIL:  Key (a)=(999) is not present in table "fp_subxact_pk".
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 87381194f41..abeb85965b9 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2726,3 +2726,26 @@ BEGIN
 END$$;
 SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
-- 
2.47.3

view thread (14+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: PG19 FK fast path: OOB write and missed FK checks during batched
  In-Reply-To: <CA+HiwqELE-eyOfBBEmpr_eGf-04PUvZg5BjypW2CMHbed5QGhA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox