public inbox for [email protected]  
help / color / mirror / Atom feed
From: Amit Langote <[email protected]>
To: Junwang Zhao <[email protected]>
Cc: Haibo Yan <[email protected]>
Cc: Pavel Stehule <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: Tomas Vondra <[email protected]>
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
Date: Tue, 24 Mar 2026 20:47:17 +0900
Message-ID: <CA+HiwqFV-PY-3BxM6j5TaAiC3AwedDxo-6vwRSbvygg3zF+xAQ@mail.gmail.com> (raw)
In-Reply-To: <CAEG8a3+Hf4tvvbts29_k_AFhWQmRYfEo_SW4C5FY_140iKghBw@mail.gmail.com>
References: <CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com>
	<CA+HiwqGM6nvAV5O+=Nr+BXMPWOma0oeCRVzVP0XiLE8zX5TVAg@mail.gmail.com>
	<CA+HiwqGMaovCUgDbGxVGnK0Mrivr+ph3YE2Ws+47-ugyPb4f7g@mail.gmail.com>
	<CAFj8pRDaiBe_GOLk_yyYHTtPiDAAaLOM8u1-=Q3ZgXBTH+1igg@mail.gmail.com>
	<CA+HiwqGA5Ay_MR0eJEEbt4j6WrVh4F+AasTp8yCbs5aJLOJn6Q@mail.gmail.com>
	<CAEG8a3JM=NoqiTK0V6S9FNxZPvy1+C5F7rfafTtPKBVWnunL-g@mail.gmail.com>
	<CA+HiwqEyiLCY6MTLbOJXDdLNNQLaURYHvdW797MQgbjEK9od4Q@mail.gmail.com>
	<CAEG8a3+VBpwPf1Rm-ECD90whM9b3YnGhux5CVXdsL6khiBfzRQ@mail.gmail.com>
	<CA+HiwqF2UHzF0sKCp-F2a-U29rqh_9ZPy=f1h+Fh_=M8efj3pg@mail.gmail.com>
	<CAEG8a3L9Ew-WL8sxLROVOcypeaENPmd8qCmMvz4geoGL1TDGCA@mail.gmail.com>
	<CAEG8a3+nUFQo4sdPQF9xy0J73J8RFJ5U9A5+_kMosGDaZ+1sXA@mail.gmail.com>
	<[email protected]>
	<CAEG8a3JyKdizWvYsF+z_mA1BKy=dpW11iKVMOG=bk6Tbz6M1Bw@mail.gmail.com>
	<CAEG8a3+Hf4tvvbts29_k_AFhWQmRYfEo_SW4C5FY_140iKghBw@mail.gmail.com>

Hi Junwang,

On Fri, Mar 20, 2026 at 1:20 AM Junwang Zhao <[email protected]> wrote:
> I squashed 0004 into 0003 so that each file can be committed independently.
> I also runned pgindent for each file.

Thanks for that.

Here's another version.

In 0001, I noticed that the condition change in ri_HashCompareOp could
be simplified further.  Also improved the commentary surrounding that.
I also updated the commit message to clarify parity with the SPI path.

Updated the commit message of 0002 to talk about why caching the
snapshot for the entire trigger firing cycle of a given constraint
makes a trade off compared to the SPI path which retakes the snapshot
for every row checked and could in principle avoid failure for FK rows
whose corresponding PK row was added by a concurrently committed
transaction, at least in the READ COMMITTED case.

Updated the commit message of 0003 to clarify that it replaces
ri_FastPathCheckCached() from 0002 with the BatchAdd/BatchFlush pair,
and that the cached resources are used unchanged -- only the probing
cadence changes from per-row to per-flush.  Per-flush CCI is safe
because all AFTER triggers for the buffered rows have already fired
by flush time; a new test case is added to show that.

Finally I added a short line at the end of each patch's commit message
to mention the speedup observed at each stage.  There are placeholders
such as <commit-hash-0001> that I will replace by an actual commit
hash before  pushing.

I will continue staring at these for any remaining issues before
pushing them one-by-one at some point by early next week.  Happy to
hear any thoughts before I push.

-- 
Thanks, Amit Langote


Attachments:

  [application/octet-stream] v9-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch (28.6K, 2-v9-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch)
  download | inline diff:
From 3086452291a81844c9f9789082362a7e5769de64 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 20:09:07 +0900
Subject: [PATCH v9 3/3] Batch FK rows and use SK_SEARCHARRAY for fast-path
 probes

Instead of probing the PK index on each trigger invocation, buffer
FK rows in the per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch.  When the buffer fills (64 rows) or the
trigger-firing cycle ends, ri_FastPathBatchFlush() probes the index
for all buffered rows, sharing a single CommandCounterIncrement and
security context switch across the batch.

ri_FastPathCheckCached() from <commit-hash-0002>, which probed the
index once per trigger invocation using cached resources, is replaced
by ri_FastPathBatchAdd() which buffers rows, and
ri_FastPathBatchFlush() which probes for the entire batch at once.
The cached resources (pk_rel, idx_rel, scandesc, slot, snapshot) are
used unchanged; the difference is that CCI, security context switch,
and curcid patching now happen once per flush rather than per row.
Per-flush CCI is safe because by the time a flush runs, all AFTER
triggers for the buffered rows have already fired.

For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag.  The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row.  A matched[] bitmap tracks which batch items
were satisfied; unmatched items are reported as violations.

Multi-column foreign keys fall back to a per-row probe loop via
ri_FastPathFlushLoop().

FK tuples are materialized via ExecCopySlotHeapTuple() into
TopTransactionContext so they survive across trigger invocations.
Violations are reported immediately during the flush via
ri_ReportViolation(), which does not return.

ri_FastPathEndBatch() flushes any partial batch before tearing down
cached resources.  Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), reopens the
relation using entry->riinfo->fk_relid if needed.

The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.

Introduce two purpose-specific memory contexts:

  - scan_cxt: child of TopTransactionContext for index scan
    allocations (e.g. _bt_preprocess_keys).  Lives for the
    trigger-firing batch, deleted at teardown, so these allocations
    are freed when the batch ends instead of at transaction end.

  - flush_cxt: child of scan_cxt for per-flush transient work (cast
    results, search array).  Reset after each flush; deleting
    scan_cxt in teardown also frees flush_cxt.

Benchmarking shows that together with <commit-hash-0001>,
<commit-hash-0002>, bulk FK inserts are ~2.9x faster (int PK / int FK,
1M rows, PK table and index cached).

Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 441 +++++++++++++++++++---
 src/test/regress/expected/foreign_key.out |  40 ++
 src/test/regress/sql/foreign_key.sql      |  38 ++
 3 files changed, 466 insertions(+), 53 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 12de0dd2cf6..993c3ac49a3 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,13 +196,28 @@ typedef struct RI_CompareHashEntry
 	FmgrInfo	cast_func_finfo;	/* in case we must coerce input */
 } RI_CompareHashEntry;
 
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal.  But each
+ * buffered row is a materialized HeapTuple in TopTransactionContext,
+ * and the matched[] scan in ri_FastPathFlushArray() is O(batch_size)
+ * per index match.  Benchmarking showed little difference between 16
+ * and 64, with 256 consistently slower.  64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE	64
+
 /*
  * RI_FastPathEntry
- *		Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ *		Per-constraint cache of resources needed by ri_FastPathBatchFlush().
  *
  * One entry per constraint, keyed by pg_constraint OID.  Created lazily
  * by ri_FastPathGetEntry() on first use within a trigger-firing batch
  * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
  */
 typedef struct RI_FastPathEntry
 {
@@ -210,8 +225,17 @@ typedef struct RI_FastPathEntry
 	Relation	pk_rel;
 	Relation	idx_rel;
 	IndexScanDesc scandesc;
-	TupleTableSlot *slot;
+	TupleTableSlot *pk_slot;
+	TupleTableSlot *fk_slot;
 	Snapshot	snapshot;		/* registered snapshot for the scan */
+	MemoryContext scan_cxt;		/* index scan allocations */
+	MemoryContext flush_cxt;	/* short-lived context for per-flush work */
+
+	HeapTuple	batch[RI_FASTPATH_BATCH_SIZE];
+	int			batch_count;
+
+	/* For ri_FastPathEndBatch() */
+	const RI_ConstraintInfo *riinfo;
 } RI_FastPathEntry;
 
 /*
@@ -274,8 +298,14 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 							bool detectNewRows, int expect_OK);
 static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
 							 Relation fk_rel, TupleTableSlot *newslot);
-static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
-								   Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+								Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+								  const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+								 const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+								  Relation fk_rel);
 static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
 								IndexScanDesc scandesc, TupleTableSlot *slot,
 								Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -300,8 +330,8 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 										   int queryno, bool is_restrict, bool partgone);
 static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
 											 Relation fk_rel);
-static void ri_FastPathTeardown(void *arg);
-
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
 
 /*
  * RI_FKey_check -
@@ -411,16 +441,22 @@ RI_FKey_check(TriggerData *trigdata)
 	 * lock.  This is semantically equivalent to the SPI path below but avoids
 	 * the per-row executor overhead.
 	 *
-	 * ri_FastPathCheckCached() and ri_FastPathCheck() report the violation
+	 * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
 	 * themselves if no matching PK row is found, so they only return on
 	 * success.
 	 */
 	if (ri_fastpath_is_applicable(riinfo))
 	{
 		if (AfterTriggerBatchIsActive())
-			ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+		{
+			/* Batched path: buffer and probe in groups */
+			ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+		}
 		else
+		{
+			/* ALTER TABLE validation: per-row, no cache */
 			ri_FastPathCheck(riinfo, fk_rel, newslot);
+		}
 		return PointerGetDatum(NULL);
 	}
 
@@ -2703,10 +2739,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 /*
  * ri_FastPathCheck
- *		Perform FK existence check via direct index probe, bypassing SPI.
+ *		Perform per row FK existence check via direct index probe,
+ *		bypassing SPI.
  *
  * If no matching PK row exists, report the violation via ri_ReportViolation(),
  * otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
  */
 static void
 ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2771,70 +2811,311 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
 }
 
 /*
- * ri_FastPathCheckCached
- *		Cached-resource variant of ri_FastPathCheck for use within the
- *		after-trigger framework.
+ * ri_FastPathBatchAdd
+ *		Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer.  When the buffer is full, flushes all
+ * buffered rows by probing the PK index.  Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
  *
  * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
  * open/close, scan begin/end, and snapshot registration.  The snapshot's
- * curcid is patched each call so the scan sees effects of prior triggers.
+ * curcid is patched at flush time so the scan sees effects of prior triggers.
  *
- * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
- * if no matching PK row is found.
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
  */
 static void
-ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
-					   Relation fk_rel, TupleTableSlot *newslot)
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+					Relation fk_rel, TupleTableSlot *newslot)
 {
 	RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+	MemoryContext oldcxt;
+
+	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+	fpentry->batch[fpentry->batch_count] =
+		ExecCopySlotHeapTuple(newslot);
+	fpentry->batch_count++;
+	MemoryContextSwitchTo(oldcxt);
+
+	if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+		ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ *		Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing).  Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+	const RI_ConstraintInfo *riinfo = fpentry->riinfo;
 	Relation	pk_rel = fpentry->pk_rel;
 	Relation	idx_rel = fpentry->idx_rel;
-	IndexScanDesc scandesc = fpentry->scandesc;
 	Snapshot	snapshot = fpentry->snapshot;
-	TupleTableSlot *slot = fpentry->slot;
-	Datum		pk_vals[INDEX_MAX_KEYS];
-	char		pk_nulls[INDEX_MAX_KEYS];
-	ScanKeyData skey[INDEX_MAX_KEYS];
-	bool		found;
+	TupleTableSlot *fk_slot = fpentry->fk_slot;
 	Oid			saved_userid;
 	int			saved_sec_context;
-	MemoryContext oldcxt;
+	MemoryContext oldcxt = CurrentMemoryContext;
 
-	/*
-	 * Advance the command counter and patch the cached snapshot's curcid so
-	 * the scan sees PK rows inserted by earlier triggers in this statement.
-	 */
-	CommandCounterIncrement();
-	fpentry->snapshot->curcid = GetCurrentCommandId(false);
+	if (fpentry->batch_count == 0)
+		return;
 
 	if (riinfo->fpmeta == NULL)
 		ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
 									  fk_rel, idx_rel);
 	Assert(riinfo->fpmeta);
 
+	/*
+	 * CCI and security context switch are done once for the entire batch.
+	 * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+	 * triggers for the buffered rows have already fired (trigger invocations
+	 * strictly alternate per row), so a single CCI advances past all their
+	 * effects.  Per-row security context switch is unnecessary because each
+	 * row's probe runs entirely as the PK table owner, same as the SPI path
+	 * -- the only difference is that the SPI path sets and restores the
+	 * context per row whereas we do it once around the whole batch.
+	 */
+	CommandCounterIncrement();
+	snapshot->curcid = GetCurrentCommandId(false);
+
 	GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
 						   saved_sec_context |
 						   SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 
-	ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
-	build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+	if (riinfo->nkeys == 1)
+		ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+	else
+		ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
+	MemoryContextSwitchTo(oldcxt);
+	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+	/* Free materialized tuples and reset */
+	for (int i = 0; i < fpentry->batch_count; i++)
+		heap_freetuple(fpentry->batch[i]);
+	fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ *		Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ */
+static void
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+					 const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+	Relation	pk_rel = fpentry->pk_rel;
+	Relation	idx_rel = fpentry->idx_rel;
+	IndexScanDesc scandesc = fpentry->scandesc;
+	TupleTableSlot *pk_slot = fpentry->pk_slot;
+	Snapshot	snapshot = fpentry->snapshot;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+
+	for (int i = 0; i < fpentry->batch_count; i++)
+	{
+		bool		found = false;
+
+		ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+
+		/*
+		 * build_index_scankeys() may palloc cast results for cross-type FKs.
+		 * Use the entry's short-lived flush context so these don't accumulate
+		 * across batches.
+		 */
+		MemoryContextSwitchTo(fpentry->flush_cxt);
+		ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+		build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+		MemoryContextSwitchTo(fpentry->scan_cxt);
+
+		found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+									snapshot, riinfo, skey, riinfo->nkeys);
+
+		if (!found)
+			ri_ReportViolation(riinfo, pk_rel, fk_rel,
+							   fk_slot, NULL,
+							   RI_PLAN_CHECK_LOOKUPPK, false, false);
+	}
+	MemoryContextReset(fpentry->flush_cxt);
+}
+
+/*
+ * ri_FastPathFlushArray
+ *		Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY.  The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order.  Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ */
+static void
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+					  const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+	FastPathMeta *fpmeta = riinfo->fpmeta;
+	Relation	pk_rel = fpentry->pk_rel;
+	Relation	idx_rel = fpentry->idx_rel;
+	IndexScanDesc scandesc = fpentry->scandesc;
+	TupleTableSlot *pk_slot = fpentry->pk_slot;
+	Snapshot	snapshot = fpentry->snapshot;
+	Datum		search_vals[RI_FASTPATH_BATCH_SIZE];
+	bool		matched[RI_FASTPATH_BATCH_SIZE];
+	int			nvals = fpentry->batch_count;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[1];
+	RI_CompareHashEntry *entry;
+	Oid			elem_type;
+	int16		elem_len;
+	bool		elem_byval;
+	char		elem_align;
+	ArrayType  *arr;
+
+	Assert(fpmeta);
+
+	memset(matched, 0, nvals * sizeof(bool));
 
 	/*
-	 * The cached scandesc lives in TopTransactionContext, but the btree AM
-	 * defers some allocations to the first index_getnext_slot call.  Ensure
-	 * those land in TopTransactionContext too.
+	 * Transient per-flush allocations (cast results, the search array) must
+	 * not accumulate across repeated flushes.  Use the entry's short-lived
+	 * flush context, reset after each flush.
 	 */
-	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
-	found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
-								riinfo, skey, riinfo->nkeys);
-	MemoryContextSwitchTo(oldcxt);
-	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+	MemoryContextSwitchTo(fpentry->flush_cxt);
 
-	if (!found)
-		ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
-						   RI_PLAN_CHECK_LOOKUPPK, false, false);
+	/*
+	 * Extract FK values, casting to the operator's expected input type if
+	 * needed (e.g. int8 FK -> int4 for int48eq).
+	 */
+	entry = fpmeta->compare_entries[0];
+	for (int i = 0; i < nvals; i++)
+	{
+		ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+		ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+		/* Cast if needed (e.g. int8 FK -> numeric PK) */
+		if (OidIsValid(entry->cast_func_finfo.fn_oid))
+			search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+										   pk_vals[0],
+										   Int32GetDatum(-1),
+										   BoolGetDatum(false));
+		else
+			search_vals[i] = pk_vals[0];
+	}
+
+	/*
+	 * Array element type must match the operator's right-hand input type,
+	 * which is what the index comparison expects on the search side.
+	 * ri_populate_fastpath_metadata() stores exactly this via
+	 * get_op_opfamily_properties(), which returns the operator's right-hand
+	 * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+	 * and the common type for same-type operators.
+	 */
+	elem_type = fpmeta->subtypes[0];
+	Assert(OidIsValid(elem_type));
+	get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+	arr = construct_array(search_vals, nvals,
+						  elem_type, elem_len, elem_byval, elem_align);
+
+	/*
+	 * Build scan key with SK_SEARCHARRAY.  The index AM code will internally
+	 * sort and deduplicate, then walk leaf pages in order.
+	 */
+	ScanKeyEntryInitialize(&skey[0],
+						   SK_SEARCHARRAY,
+						   1,	/* attno */
+						   fpmeta->strats[0],
+						   fpmeta->subtypes[0],
+						   idx_rel->rd_indcollation[0],
+						   fpmeta->regops[0],
+						   PointerGetDatum(arr));
+
+	/*
+	 * Switch to scan_cxt for the index scan: index AMs may defer internal
+	 * allocations (e.g. _bt_preprocess_keys) to the first
+	 * index_getnext_slot() call.  Those must survive across rescans within a
+	 * batch; scan_cxt is deleted in teardown, cleaning them up when the batch
+	 * ends.
+	 */
+	MemoryContextSwitchTo(fpentry->scan_cxt);
+
+	index_rescan(scandesc, skey, 1, NULL, 0);
+
+	/*
+	 * Walk all matches.  The index AM returns them in index order.  For each
+	 * match, find which batch item(s) it satisfies.
+	 */
+	while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+	{
+		Datum		found_val;
+		bool		found_null;
+		bool		concurrently_updated;
+		ScanKeyData recheck_skey[1];
+
+		if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+			continue;
+
+		/* Extract the PK value from the matched and locked tuple */
+		found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+		Assert(!found_null);
+
+		if (concurrently_updated)
+		{
+			/*
+			 * Build a single-key scankey for recheck.  We need the actual PK
+			 * value that was found, not the FK search value.
+			 */
+			ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+								   fpmeta->strats[0],
+								   fpmeta->subtypes[0],
+								   idx_rel->rd_indcollation[0],
+								   fpmeta->regops[0],
+								   found_val);
+			if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+				continue;
+		}
+
+		/*
+		 * Linear scan to mark all batch items matching this PK value.
+		 * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+		 * current batch size of 64.
+		 */
+		for (int i = 0; i < nvals; i++)
+		{
+			if (!matched[i] &&
+				DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+											   idx_rel->rd_indcollation[0],
+											   found_val,
+											   search_vals[i])))
+				matched[i] = true;
+		}
+	}
+
+	/* Report first unmatched row */
+	for (int i = 0; i < nvals; i++)
+	{
+		if (!matched[i])
+		{
+			ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+			ri_ReportViolation(riinfo, pk_rel, fk_rel,
+							   fk_slot, NULL,
+							   RI_PLAN_CHECK_LOOKUPPK, false, false);
+		}
+	}
+
+	MemoryContextReset(fpentry->flush_cxt);
 }
 
 /*
@@ -2845,9 +3126,10 @@ ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
  * Returns true if a matching PK row was found, locked, and (if
  * applicable) visible to the transaction snapshot.
  *
- * The caller must ensure CurrentMemoryContext is long-lived enough
- * for the scan descriptor's internal allocations (typically
- * TopTransactionContext when using a cached scandesc).
+ * When using a cached scandesc (from the batch path), the caller must switch
+ * to the entry's scan_cxt before calling so that index AM allocations during
+ * index_getnext_slot() survive across rescans.  ri_FastPathCheck uses a
+ * one-shot scan and ends it immediately, so no such switch is needed.
  */
 static bool
 ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
@@ -3769,14 +4051,51 @@ RI_FKey_trigger_type(Oid tgfoid)
 	return RI_TRIGGER_NONE;
 }
 
+/*
+ * ri_FastPathEndBatch
+ *		Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback.  Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation.  If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+	HASH_SEQ_STATUS status;
+	RI_FastPathEntry *entry;
+
+	if (ri_fastpath_cache == NULL)
+		return;
+
+	/* Flush any partial batches -- can throw ERROR */
+	hash_seq_init(&status, ri_fastpath_cache);
+	while ((entry = hash_seq_search(&status)) != NULL)
+	{
+		if (entry->batch_count > 0)
+		{
+			Relation	fk_rel = table_open(entry->riinfo->fk_relid,
+											AccessShareLock);
+
+			ri_FastPathBatchFlush(entry, fk_rel);
+			table_close(fk_rel, NoLock);
+		}
+	}
+
+	/* Orderly teardown */
+	ri_FastPathTeardown();
+}
+
 /*
  * ri_FastPathTeardown
  *		Tear down all cached fast-path state.
  *
- * Called as an AfterTriggerBatchCallback at end of batch.
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
  */
 static void
-ri_FastPathTeardown(void *arg)
+ri_FastPathTeardown(void)
 {
 	HASH_SEQ_STATUS status;
 	RI_FastPathEntry *entry;
@@ -3794,10 +4113,14 @@ ri_FastPathTeardown(void *arg)
 			index_close(entry->idx_rel, NoLock);
 		if (entry->pk_rel)
 			table_close(entry->pk_rel, NoLock);
-		if (entry->slot)
-			ExecDropSingleTupleTableSlot(entry->slot);
+		if (entry->pk_slot)
+			ExecDropSingleTupleTableSlot(entry->pk_slot);
+		if (entry->fk_slot)
+			ExecDropSingleTupleTableSlot(entry->fk_slot);
 		if (entry->snapshot)
 			UnregisterSnapshot(entry->snapshot);
+		if (entry->scan_cxt)
+			MemoryContextDelete(entry->scan_cxt);
 	}
 
 	hash_destroy(ri_fastpath_cache);
@@ -3911,23 +4234,32 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
 
 		/*
 		 * Register an initial snapshot.  Its curcid will be patched in place
-		 * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+		 * on each subsequent row (see ri_FastPathBatchFlush()), avoiding
 		 * per-row GetSnapshotData() overhead.
 		 */
 		entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
 
-		entry->slot = table_slot_create(entry->pk_rel, NULL);
+		entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+		entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+												  &TTSOpsHeapTuple);
 
 		entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
 										  entry->snapshot, NULL,
 										  riinfo->nkeys, 0);
 
+		entry->scan_cxt = AllocSetContextCreate(TopTransactionContext,
+												"RI fast path scan context",
+												ALLOCSET_DEFAULT_SIZES);
+		entry->flush_cxt = AllocSetContextCreate(entry->scan_cxt,
+												 "RI fast path flush temporary context",
+												 ALLOCSET_SMALL_SIZES);
+
 		MemoryContextSwitchTo(oldcxt);
 
 		/* Ensure cleanup at end of this trigger-firing batch */
 		if (!ri_fastpath_callback_registered)
 		{
-			RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+			RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
 			ri_fastpath_callback_registered = true;
 		}
 
@@ -3938,6 +4270,9 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
 							   SECURITY_NOFORCE_RLS);
 		ri_CheckPermissions(entry->pk_rel);
 		SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+		/* For ri_FastPathEndBatch() */
+		entry->riinfo = riinfo;
 	}
 
 	return entry;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 25d505c6c12..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3590,3 +3590,43 @@ NOTICE:  fp_auto_pk called
 NOTICE:  fp_auto_pk called
 DROP TABLE fp_fk_cci, fp_pk_cci;
 DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+    FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR:  insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL:  Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+    DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR:  insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL:  Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR:  insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL:  Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index cedd20c8d11..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2578,3 +2578,41 @@ INSERT INTO fp_fk_cci VALUES (1), (2), (3);
 
 DROP TABLE fp_fk_cci, fp_pk_cci;
 DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+    FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+    DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
-- 
2.47.3



  [application/octet-stream] v9-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch (29.2K, 3-v9-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch)
  download | inline diff:
From 81a0149aaf044dc32610355c9178a40e7f7b4d57 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 20:13:03 +0900
Subject: [PATCH v9 2/3] Cache per-batch resources for fast-path foreign key
 checks

The fast-path FK check introduced in <commit-hash-0001> opens and
closes the PK relation, index, scan descriptor, and tuple slot on
every trigger invocation.  For bulk operations that fire thousands of
FK triggers in a single statement, this repeated setup/teardown
dominates the cost.

Introduce RI_FastPathEntry, a per-constraint hash table that caches
the open Relation (pk_rel, idx_rel), IndexScanDesc, TupleTableSlot,
and a registered Snapshot across all trigger invocations within a
single trigger-firing batch.  Entries are created lazily on first use
via ri_FastPathGetEntry() and persist until the batch ends.

The snapshot is registered once at entry creation time, and its
curcid is patched in place on each subsequent row rather than
taking a fresh snapshot per invocation.  This avoids the per-row
ProcArrayLock acquire/release that GetSnapshotData() requires even
in its fast-path reuse case.  Under REPEATABLE READ the transaction
snapshot is immutable so caching is a no-op.  Under READ COMMITTED
the cached snapshot will not reflect PK rows committed by other
backends mid-batch.  The SPI path's per-row GetSnapshotData() might
catch these depending on timing, but that visibility is
non-deterministic -- whether a given row's FK check happens to see
a concurrent commit is a race, not a guarantee.  The FK check only
needs PK rows visible before the statement began plus effects of
earlier triggers (tracked by curcid), and LockTupleKeyShare prevents
the PK row from disappearing regardless.  CommandCounterIncrement
still runs on each invocation of ri_FastPathCheckCached(), matching
the SPI path's per-row CCI inside _SPI_execute_plan.

SnapshotSetCommandId() only patches the process-global statics, not
registered copies, so we patch entry->snapshot->curcid directly.

Permission checks (schema USAGE + table SELECT) are performed once at
cache entry creation rather than per flush.  The RI check runs as the
PK table owner (via SetUserIdAndSecContext), so in practice these
checks verify that the owner has access to their own table -- a
condition that holds unless privileges have been explicitly revoked
from the owner, which would equally break the SPI path.  Checking
once per batch avoids repeated syscache lookups from
pg_class_aclcheck() with no user-visible behavior change.

Lifecycle management:

  - AfterTriggerBatchCallback: A new general-purpose callback
    mechanism in trigger.c.  Callbacks registered via
    RegisterAfterTriggerBatchCallback() fire at the end of each
    trigger-firing batch (AfterTriggerEndQuery for immediate
    constraints, AfterTriggerFireDeferred at COMMIT, and
    AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE).  The RI code
    registers ri_FastPathTeardown as a batch callback, which does
    orderly teardown: index_endscan, index_close, table_close,
    ExecDropSingleTupleTableSlot, UnregisterSnapshot.

  - Batch callbacks only fire at the outermost query level
    (query_depth == 0 in AfterTriggerEndQuery and checked inside
    FireAfterTriggerBatchCallbacks), so nested queries from SPI
    inside other AFTER triggers do not tear down the cache mid-batch.

  - XactCallback: ri_FastPathXactCallback NULLs the static cache
    pointer at transaction end.  On the normal path, cleanup already
    ran via the batch callback; this handles the abort path where
    TopTransactionContext destruction frees the memory but
    ResourceOwner handles the actual resource cleanup.

  - SubXactCallback: ri_FastPathSubXactCallback NULLs the static
    cache pointer on subtransaction abort.  ResourceOwner already
    cleaned up the resources; this prevents the batch callback from
    trying to double-close them.

  - AfterTriggerBatchIsActive(): Exported accessor that returns true
    when afterTriggers.query_depth >= 0.  During ALTER TABLE ... ADD
    FOREIGN KEY validation, RI triggers are called directly outside
    the after-trigger framework, so batch callbacks would never fire.
    The fast-path code uses this to fall back to a non-cached
    per-invocation path (open/scan/close each call) in that context.

Benchmarking shows that together with <commit-hash-0001>, bulk FK
inserts are ~2.2x faster (int PK / int FK, 1M rows, PK table
and index cached).

Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
 src/backend/commands/trigger.c            |  90 +++++++
 src/backend/utils/adt/ri_triggers.c       | 275 +++++++++++++++++++++-
 src/include/commands/trigger.h            |  18 ++
 src/test/regress/expected/foreign_key.out |  86 +++++++
 src/test/regress/sql/foreign_key.sql      |  80 +++++++
 src/tools/pgindent/typedefs.list          |   3 +
 6 files changed, 549 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..b7442cf6cb1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
 	/* per-subtransaction-level data: */
 	AfterTriggersTransData *trans_stack;	/* array of structs shown below */
 	int			maxtransdepth;	/* allocated len of above array */
+
+	List	   *batch_callbacks;	/* List of AfterTriggerCallbackItem */
 } AfterTriggersData;
 
 struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
 	TupleTableSlot *storeslot;	/* for converting to tuplestore's format */
 };
 
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+	AfterTriggerBatchCallback callback;
+	void	   *arg;
+} AfterTriggerCallbackItem;
+
 static AfterTriggersData afterTriggers;
 
 static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
 													Oid tgoid, bool tgisdeferred);
 static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
 
+static void FireAfterTriggerBatchCallbacks(void);
 
 /*
  * Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
 	 */
 	afterTriggers.firing_counter = (CommandId) 1;	/* mustn't be 0 */
 	afterTriggers.query_depth = -1;
+	afterTriggers.batch_callbacks = NIL;
 
 	/*
 	 * Verify that there is no leftover state remaining.  If these assertions
@@ -5210,6 +5221,8 @@ AfterTriggerEndQuery(EState *estate)
 			break;
 	}
 
+	FireAfterTriggerBatchCallbacks();
+
 	/* Release query-level-local storage, including tuplestores if any */
 	AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
 
@@ -5317,6 +5330,8 @@ AfterTriggerFireDeferred(void)
 			break;				/* all fired */
 	}
 
+	FireAfterTriggerBatchCallbacks();
+
 	/*
 	 * We don't bother freeing the event list, since it will go away anyway
 	 * (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6074,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
 				break;			/* all fired */
 		}
 
+		FireAfterTriggerBatchCallbacks();
+
 		if (snapshot_set)
 			PopActiveSnapshot();
 	}
@@ -6755,3 +6772,76 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
 
 	return tuple;
 }
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ *		Register a function to be called when the current trigger-firing
+ *		batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+								  void *arg)
+{
+	AfterTriggerCallbackItem *item;
+	MemoryContext oldcxt;
+
+	/*
+	 * Allocate in TopTransactionContext so the item survives for the duration
+	 * of the batch, which may span multiple trigger invocations.
+	 */
+	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+	item = palloc(sizeof(AfterTriggerCallbackItem));
+	item->callback = callback;
+	item->arg = arg;
+	afterTriggers.batch_callbacks =
+		lappend(afterTriggers.batch_callbacks, item);
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ *		Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT).  Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+	ListCell   *lc;
+
+	if (afterTriggers.query_depth > 0)
+		return;
+
+	foreach(lc, afterTriggers.batch_callbacks)
+	{
+		AfterTriggerCallbackItem *item = lfirst(lc);
+
+		item->callback(item->arg);
+	}
+
+	list_free_deep(afterTriggers.batch_callbacks);
+	afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ *		Returns true if we're inside a query-level trigger batch where
+ *		registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+	return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6d8de64471f..12de0dd2cf6 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,23 @@ typedef struct RI_CompareHashEntry
 	FmgrInfo	cast_func_finfo;	/* in case we must coerce input */
 } RI_CompareHashEntry;
 
+/*
+ * RI_FastPathEntry
+ *		Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ *
+ * One entry per constraint, keyed by pg_constraint OID.  Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ */
+typedef struct RI_FastPathEntry
+{
+	Oid			conoid;			/* hash key: pg_constraint OID */
+	Relation	pk_rel;
+	Relation	idx_rel;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	Snapshot	snapshot;		/* registered snapshot for the scan */
+} RI_FastPathEntry;
 
 /*
  * Local data
@@ -205,6 +222,8 @@ static HTAB *ri_query_cache = NULL;
 static HTAB *ri_compare_cache = NULL;
 static dclist_head ri_constraint_cache_valid_list;
 
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
 
 /*
  * Local function prototypes
@@ -255,6 +274,8 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 							bool detectNewRows, int expect_OK);
 static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
 							 Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+								   Relation fk_rel, TupleTableSlot *newslot);
 static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
 								IndexScanDesc scandesc, TupleTableSlot *slot,
 								Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +298,9 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 										   Relation pk_rel, Relation fk_rel,
 										   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 										   int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+											 Relation fk_rel);
+static void ri_FastPathTeardown(void *arg);
 
 
 /*
@@ -387,12 +411,16 @@ RI_FKey_check(TriggerData *trigdata)
 	 * lock.  This is semantically equivalent to the SPI path below but avoids
 	 * the per-row executor overhead.
 	 *
-	 * ri_FastPathCheck() reports the violation itself (via ereport) if no
-	 * matching PK row is found, so it only returns on success.
+	 * ri_FastPathCheckCached() and ri_FastPathCheck() report the violation
+	 * themselves if no matching PK row is found, so they only return on
+	 * success.
 	 */
 	if (ri_fastpath_is_applicable(riinfo))
 	{
-		ri_FastPathCheck(riinfo, fk_rel, newslot);
+		if (AfterTriggerBatchIsActive())
+			ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+		else
+			ri_FastPathCheck(riinfo, fk_rel, newslot);
 		return PointerGetDatum(NULL);
 	}
 
@@ -2742,6 +2770,73 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
 	table_close(pk_rel, NoLock);
 }
 
+/*
+ * ri_FastPathCheckCached
+ *		Cached-resource variant of ri_FastPathCheck for use within the
+ *		after-trigger framework.
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, scan begin/end, and snapshot registration.  The snapshot's
+ * curcid is patched each call so the scan sees effects of prior triggers.
+ *
+ * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
+ * if no matching PK row is found.
+ */
+static void
+ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+					   Relation fk_rel, TupleTableSlot *newslot)
+{
+	RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+	Relation	pk_rel = fpentry->pk_rel;
+	Relation	idx_rel = fpentry->idx_rel;
+	IndexScanDesc scandesc = fpentry->scandesc;
+	Snapshot	snapshot = fpentry->snapshot;
+	TupleTableSlot *slot = fpentry->slot;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	bool		found;
+	Oid			saved_userid;
+	int			saved_sec_context;
+	MemoryContext oldcxt;
+
+	/*
+	 * Advance the command counter and patch the cached snapshot's curcid so
+	 * the scan sees PK rows inserted by earlier triggers in this statement.
+	 */
+	CommandCounterIncrement();
+	fpentry->snapshot->curcid = GetCurrentCommandId(false);
+
+	if (riinfo->fpmeta == NULL)
+		ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+									  fk_rel, idx_rel);
+	Assert(riinfo->fpmeta);
+
+	GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   saved_sec_context |
+						   SECURITY_LOCAL_USERID_CHANGE |
+						   SECURITY_NOFORCE_RLS);
+
+	ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+	build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+	/*
+	 * The cached scandesc lives in TopTransactionContext, but the btree AM
+	 * defers some allocations to the first index_getnext_slot call.  Ensure
+	 * those land in TopTransactionContext too.
+	 */
+	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+	found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
+								riinfo, skey, riinfo->nkeys);
+	MemoryContextSwitchTo(oldcxt);
+	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+	if (!found)
+		ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false, false);
+}
+
 /*
  * ri_FastPathProbeOne
  *		Probe the PK index for one set of scan keys, lock the matching
@@ -3673,3 +3768,177 @@ RI_FKey_trigger_type(Oid tgfoid)
 
 	return RI_TRIGGER_NONE;
 }
+
+/*
+ * ri_FastPathTeardown
+ *		Tear down all cached fast-path state.
+ *
+ * Called as an AfterTriggerBatchCallback at end of batch.
+ */
+static void
+ri_FastPathTeardown(void *arg)
+{
+	HASH_SEQ_STATUS status;
+	RI_FastPathEntry *entry;
+
+	if (ri_fastpath_cache == NULL)
+		return;
+
+	hash_seq_init(&status, ri_fastpath_cache);
+	while ((entry = hash_seq_search(&status)) != NULL)
+	{
+		/* Close both scans before closing idx_rel. */
+		if (entry->scandesc)
+			index_endscan(entry->scandesc);
+		if (entry->idx_rel)
+			index_close(entry->idx_rel, NoLock);
+		if (entry->pk_rel)
+			table_close(entry->pk_rel, NoLock);
+		if (entry->slot)
+			ExecDropSingleTupleTableSlot(entry->slot);
+		if (entry->snapshot)
+			UnregisterSnapshot(entry->snapshot);
+	}
+
+	hash_destroy(ri_fastpath_cache);
+	ri_fastpath_cache = NULL;
+	ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+	/*
+	 * TopTransactionContext is destroyed at end of transaction, taking the
+	 * hash table and all cached resources with it.  Just reset our static
+	 * pointers so we don't dereference freed memory.
+	 *
+	 * In the normal (non-error) path, ri_FastPathTeardown already ran via the
+	 * batch callback and did orderly teardown.  Here we're just handling the
+	 * abort path where that callback never fired.
+	 */
+	ri_fastpath_cache = NULL;
+	ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+						   SubTransactionId parentSubid, void *arg)
+{
+	if (event == SUBXACT_EVENT_ABORT_SUB)
+	{
+		/*
+		 * ResourceOwner already cleaned up relations and snapshots.  Just
+		 * NULL our pointers so the still-registered batch callback becomes a
+		 * no-op.  The hash table memory in TopTransactionContext will be
+		 * freed at transaction end.
+		 */
+		ri_fastpath_cache = NULL;
+		ri_fastpath_callback_registered = false;
+	}
+}
+
+/*
+ * ri_FastPathGetEntry
+ *		Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, begins an index scan, allocates a result slot, and registers
+ * the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.  Caller uses
+ * index_rescan() with new keys.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+	RI_FastPathEntry *entry;
+	bool		found;
+
+	/* Create hash table on first use in this batch */
+	if (ri_fastpath_cache == NULL)
+	{
+		HASHCTL		ctl;
+
+		if (!ri_fastpath_xact_callback_registered)
+		{
+			RegisterXactCallback(ri_FastPathXactCallback, NULL);
+			RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+			ri_fastpath_xact_callback_registered = true;
+		}
+
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(RI_FastPathEntry);
+		ctl.hcxt = TopTransactionContext;
+		ri_fastpath_cache = hash_create("RI fast-path cache",
+										16,
+										&ctl,
+										HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+	}
+
+	entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+						HASH_ENTER, &found);
+
+	if (!found)
+	{
+		MemoryContext oldcxt;
+		Oid			saved_userid;
+		int			saved_sec_context;
+
+		/*
+		 * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+		 * out during partial initialization below.
+		 */
+		memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+			   sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+		oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+		/*
+		 * Open PK table and its unique index.
+		 *
+		 * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+		 * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+		 * on the index is standard for index scans.
+		 *
+		 * We don't release these locks until end of transaction, matching SPI
+		 * behavior.
+		 */
+		entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+		entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+		/*
+		 * Register an initial snapshot.  Its curcid will be patched in place
+		 * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+		 * per-row GetSnapshotData() overhead.
+		 */
+		entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+		entry->slot = table_slot_create(entry->pk_rel, NULL);
+
+		entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
+										  entry->snapshot, NULL,
+										  riinfo->nkeys, 0);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* Ensure cleanup at end of this trigger-firing batch */
+		if (!ri_fastpath_callback_registered)
+		{
+			RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+			ri_fastpath_callback_registered = true;
+		}
+
+		GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+		SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+							   saved_sec_context |
+							   SECURITY_LOCAL_USERID_CHANGE |
+							   SECURITY_NOFORCE_RLS);
+		ri_CheckPermissions(entry->pk_rel);
+		SetUserIdAndSecContext(saved_userid, saved_sec_context);
+	}
+
+	return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
 
 extern int	RI_FKey_trigger_type(Oid tgfoid);
 
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback().  Invoked when
+ * a batch of after-trigger processing completes:
+ *	- AfterTriggerEndQuery()      (immediate constraints)
+ *	- AfterTriggerFireDeferred()  (deferred constraints at COMMIT)
+ *	- AfterTriggerSetState()      (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch.  Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+											  void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
 #endif							/* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..25d505c6c12 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,89 @@ DETAIL:  drop cascades to table fkpart13_t1
 drop cascades to table fkpart13_t2
 drop cascades to table fkpart13_t3
 RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101);  -- should fail (constraint active)
+ERROR:  insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL:  Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200);  -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2;  -- should fail
+ERROR:  insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL:  Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+    a int REFERENCES fp_pk1,
+    b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1);  -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2);  -- second constraint fails
+ERROR:  insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL:  Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE;  -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3);  -- should fail, also tests that cache was cleaned up
+ERROR:  insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL:  Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a 
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+  RAISE NOTICE 'fp_auto_pk called';
+  INSERT INTO fp_pk_cci VALUES (NEW.a);
+  RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+  FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE:  fp_auto_pk called
+NOTICE:  fp_auto_pk called
+NOTICE:  fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..cedd20c8d11 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,83 @@ WITH cte AS (
 
 DROP SCHEMA fkpart13 CASCADE;
 RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101);  -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200);  -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2;  -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+    a int REFERENCES fp_pk1,
+    b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1);  -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2);  -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE;  -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3);  -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+  RAISE NOTICE 'fp_auto_pk called';
+  INSERT INTO fp_pk_cci VALUES (NEW.a);
+  RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+  FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c51a0a903a6..0b05304a294 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
 AddrInfo
 AffixNode
 AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
 AfterTriggerEvent
 AfterTriggerEventChunk
 AfterTriggerEventData
@@ -2478,6 +2480,7 @@ RIX
 RI_CompareHashEntry
 RI_CompareKey
 RI_ConstraintInfo
+RI_FastPathEntry
 RI_QueryHashEntry
 RI_QueryKey
 RTEKind
-- 
2.47.3



  [application/octet-stream] v9-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (31.1K, 4-v9-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
  download | inline diff:
From 92e8fd30d87a08fa675e7d15cf60b40c11d9afc8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 18:28:00 +0900
Subject: [PATCH v9 1/3] Add fast path for foreign key constraint checks

Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.

The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics.  Otherwise, the
existing SPI path is used.

ri_FastPathCheck() extracts the FK values, builds scan keys, performs
an index scan, and locks the matching tuple with LockTupleKeyShare
via ri_LockPKTuple(), which handles the RI-specific subset of
table_tuple_lock() results.

If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.

The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot).  Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.

The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path gets implicitly through
the executor's permission checks.  The fast path also switches to
the PK table owner's security context (with SECURITY_NOFORCE_RLS)
before the index probe, matching the SPI path where the query runs
as the table owner.

ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The existing code asserted same-type operators only, which was correct
for its existing callers (ri_KeysEqual compares same-type FK column
values via ff_eq_oprs), but the fast path is the first caller to pass
pf_eq_oprs, which can be cross-type.

Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.

New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks.  New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.

Benchmarking shows ~1.8x speedup for bulk FK inserts (int PK/int FK,
1M rows, where PK table and index is cached).

Author: Junwang Zhao <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c           | 469 +++++++++++++++++-
 .../expected/fk-concurrent-pk-upd.out         | 105 ++++
 src/test/isolation/isolation_schedule         |   1 +
 .../isolation/specs/fk-concurrent-pk-upd.spec |  53 ++
 src/test/regress/expected/foreign_key.out     |  47 ++
 src/test/regress/sql/foreign_key.sql          |  64 +++
 src/tools/pgindent/typedefs.list              |   1 +
 7 files changed, 726 insertions(+), 14 deletions(-)
 create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
 create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..6d8de64471f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/index.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
@@ -91,6 +94,7 @@
 #define RI_TRIGTYPE_UPDATE 2
 #define RI_TRIGTYPE_DELETE 3
 
+typedef struct FastPathMeta FastPathMeta;
 
 /*
  * RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
 	Oid			period_intersect_oper;	/* anyrange * anyrange (or
 										 * multiranges) */
 	dlist_node	valid_link;		/* Link in list of valid entries */
+
+	Oid			conindid;
+	bool		pk_is_partitioned;
+
+	FastPathMeta *fpmeta;
 } RI_ConstraintInfo;
 
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+	RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+	RegProcedure regops[RI_MAX_NUMKEYS];
+	Oid			subtypes[RI_MAX_NUMKEYS];
+	int			strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
 /*
  * RI_QueryKey
  *
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
 							bool is_restrict,
 							bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+							 Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+								IndexScanDesc scandesc, TupleTableSlot *slot,
+								Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+								ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+						   bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+									 TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+								 Relation idx_rel, Datum *pk_vals,
+								 char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+										  Relation fk_rel, Relation idx_rel);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
 	if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
 		return PointerGetDatum(NULL);
 
-	/*
-	 * Get the relation descriptors of the FK and PK tables.
-	 *
-	 * pk_rel is opened in RowShareLock mode since that's what our eventual
-	 * SELECT FOR KEY SHARE will get on it.
-	 */
 	fk_rel = trigdata->tg_relation;
-	pk_rel = table_open(riinfo->pk_relid, RowShareLock);
 
 	switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
 	{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
 			 * No further check needed - an all-NULL key passes every type of
 			 * foreign key constraint.
 			 */
-			table_close(pk_rel, RowShareLock);
 			return PointerGetDatum(NULL);
 
 		case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
 							 errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
 							 errtableconstraint(fk_rel,
 												NameStr(riinfo->conname))));
-					table_close(pk_rel, RowShareLock);
 					return PointerGetDatum(NULL);
 
 				case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
 					 * MATCH SIMPLE - if ANY column is null, the key passes
 					 * the constraint.
 					 */
-					table_close(pk_rel, RowShareLock);
 					return PointerGetDatum(NULL);
 
 #ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
+	/*
+	 * Fast path: probe the PK unique index directly, bypassing SPI.
+	 *
+	 * For non-partitioned, non-temporal FKs, we can skip the SPI machinery
+	 * (plan cache, executor setup, etc.) and do a direct index scan + tuple
+	 * lock.  This is semantically equivalent to the SPI path below but avoids
+	 * the per-row executor overhead.
+	 *
+	 * ri_FastPathCheck() reports the violation itself (via ereport) if no
+	 * matching PK row is found, so it only returns on success.
+	 */
+	if (ri_fastpath_is_applicable(riinfo))
+	{
+		ri_FastPathCheck(riinfo, fk_rel, newslot);
+		return PointerGetDatum(NULL);
+	}
+
 	SPI_connect();
 
+	/*
+	 * pk_rel is opened in RowShareLock mode since that's what our eventual
+	 * SELECT FOR KEY SHARE will get on it.
+	 */
+	pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
 
 	riinfo->valid = true;
 
+	riinfo->conindid = conForm->conindid;
+	riinfo->pk_is_partitioned =
+		(get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+	riinfo->fpmeta = NULL;
+
 	return riinfo;
 }
 
@@ -2617,6 +2673,382 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	return SPI_processed != 0;
 }
 
+/*
+ * ri_FastPathCheck
+ *		Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+				 Relation fk_rel, TupleTableSlot *newslot)
+{
+	Relation	pk_rel;
+	Relation	idx_rel;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	bool		found = false;
+	Oid			saved_userid;
+	int			saved_sec_context;
+	Snapshot	snapshot;
+
+	/*
+	 * Advance the command counter so the snapshot sees the effects of prior
+	 * triggers in this statement.  Mirrors what the SPI path does in
+	 * ri_PerformCheck().
+	 */
+	CommandCounterIncrement();
+	snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+	pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+	idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+	slot = table_slot_create(pk_rel, NULL);
+	scandesc = index_beginscan(pk_rel, idx_rel,
+							   snapshot, NULL,
+							   riinfo->nkeys, 0);
+
+	if (riinfo->fpmeta == NULL)
+		ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+									  fk_rel, idx_rel);
+	Assert(riinfo->fpmeta);
+
+	GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   saved_sec_context |
+						   SECURITY_LOCAL_USERID_CHANGE |
+						   SECURITY_NOFORCE_RLS);
+	ri_CheckPermissions(pk_rel);
+
+	ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+	build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+	found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+								snapshot, riinfo, skey, riinfo->nkeys);
+	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+	index_endscan(scandesc);
+	ExecDropSingleTupleTableSlot(slot);
+	UnregisterSnapshot(snapshot);
+
+	if (!found)
+		ri_ReportViolation(riinfo, pk_rel, fk_rel,
+						   newslot, NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+	index_close(idx_rel, NoLock);
+	table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ *		Probe the PK index for one set of scan keys, lock the matching
+ *		tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ *
+ * The caller must ensure CurrentMemoryContext is long-lived enough
+ * for the scan descriptor's internal allocations (typically
+ * TopTransactionContext when using a cached scandesc).
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+					IndexScanDesc scandesc, TupleTableSlot *slot,
+					Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+					ScanKeyData *skey, int nkeys)
+{
+	bool		found = false;
+
+	index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+	if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+	{
+		bool		concurrently_updated;
+
+		if (ri_LockPKTuple(pk_rel, slot, snapshot,
+						   &concurrently_updated))
+		{
+			if (concurrently_updated)
+				found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+			else
+				found = true;
+		}
+	}
+
+	return found;
+}
+
+/*
+ * ri_LockPKTuple
+ *		Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+			   bool *concurrently_updated)
+{
+	TM_FailureData tmfd;
+	TM_Result	result;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+	*concurrently_updated = false;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+							  slot, GetCurrentCommandId(false),
+							  LockTupleKeyShare, LockWaitBlock,
+							  lockflags, &tmfd);
+
+	switch (result)
+	{
+		case TM_Ok:
+			if (tmfd.traversed)
+				*concurrently_updated = true;
+			return true;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			return false;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+
+			/*
+			 * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+			 * chain and returned TM_Ok.  Getting here means something
+			 * unexpected -- fall through to error.
+			 */
+			elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+			break;
+
+		case TM_SelfModified:
+
+			/*
+			 * The current command or a later command in this transaction
+			 * modified the PK row.  This shouldn't normally happen during an
+			 * FK check (we're not modifying pk_rel), but handle it safely by
+			 * treating the tuple as not found.
+			 */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			break;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+			break;
+	}
+
+	return false;				/* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+	/*
+	 * Partitioned referenced tables are skipped for simplicity, since they
+	 * require routing the probe through the correct partition using
+	 * PartitionDirectory.
+	 */
+	if (riinfo->pk_is_partitioned)
+		return false;
+
+	/*
+	 * Temporal foreign keys use range overlap and containment semantics (&&,
+	 * <@, range_agg()) that inherently involve aggregation and multiple-row
+	 * reasoning, so they stay on the SPI path.
+	 */
+	if (riinfo->hasperiod)
+		return false;
+
+	return true;
+}
+
+/*
+ * ri_CheckPermissions
+ *   Check that the current user has permissions to look into the schema of
+ *   and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+	AclResult	aclresult;
+
+	/* USAGE on schema. */
+	aclresult = object_aclcheck(NamespaceRelationId,
+								RelationGetNamespace(query_rel),
+								GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(query_rel)));
+
+	/* SELECT on relation. */
+	aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ *		After following an update chain (tmfd.traversed), verify that
+ *		the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine.  A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+						 TupleTableSlot *new_slot)
+{
+	/*
+	 * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+	 * This only fires on the concurrent-update path (tmfd.traversed), which
+	 * should be rare, so the cost is acceptable for now.  If profiling shows
+	 * otherwise, cache the IndexInfo in FastPathMeta.
+	 */
+	IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+	Datum		values[INDEX_MAX_KEYS];
+	bool		isnull[INDEX_MAX_KEYS];
+	bool		matched = true;
+
+	/* PK indexes never have these. */
+	Assert(indexInfo->ii_Expressions == NIL &&
+		   indexInfo->ii_ExclusionOps == NULL);
+
+	/* Form the index values and isnull flags given the table tuple. */
+	FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+	for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+	{
+		ScanKeyData *skey = &skeys[i];
+
+		/* A PK column can never be set to NULL. */
+		Assert(!isnull[i]);
+		if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+											skey->sk_collation,
+											values[i],
+											skey->sk_argument)))
+		{
+			matched = false;
+			break;
+		}
+	}
+
+	return matched;
+}
+
+/*
+ * build_index_scankeys
+ *		Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation.  Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+					 Relation idx_rel, Datum *pk_vals,
+					 char *pk_nulls, ScanKey skeys)
+{
+	FastPathMeta *fpmeta = riinfo->fpmeta;
+
+	Assert(fpmeta);
+
+	/*
+	 * May need to cast each of the individual values of the foreign key to
+	 * the corresponding PK column's type if the equality operator demands it.
+	 */
+	for (int i = 0; i < riinfo->nkeys; i++)
+	{
+		if (pk_nulls[i] != 'n')
+		{
+			RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+			if (OidIsValid(entry->cast_func_finfo.fn_oid))
+				pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+										   pk_vals[i],
+										   Int32GetDatum(-1),	/* typmod */
+										   BoolGetDatum(false));	/* implicit coercion */
+		}
+	}
+
+	/*
+	 * Set up ScanKeys for the index scan. This is essentially how
+	 * ExecIndexBuildScanKeys() sets them up.
+	 */
+	for (int i = 0; i < riinfo->nkeys; i++)
+	{
+		int			pkattrno = i + 1;
+
+		ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+							   fpmeta->strats[i], fpmeta->subtypes[i],
+							   idx_rel->rd_indcollation[i], fpmeta->regops[i],
+							   pk_vals[i]);
+	}
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ *		Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column.  Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+							  Relation fk_rel, Relation idx_rel)
+{
+	FastPathMeta *fpmeta;
+	MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+	Assert(riinfo != NULL && riinfo->valid);
+
+	fpmeta = palloc_object(FastPathMeta);
+	for (int i = 0; i < riinfo->nkeys; i++)
+	{
+		Oid			eq_opr = riinfo->pf_eq_oprs[i];
+		Oid			typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+		Oid			lefttype;
+		RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+		fpmeta->compare_entries[i] = entry;
+		fpmeta->regops[i] = get_opcode(eq_opr);
+
+		get_op_opfamily_properties(eq_opr,
+								   idx_rel->rd_opfamily[i],
+								   false,
+								   &fpmeta->strats[i],
+								   &lefttype,
+								   &fpmeta->subtypes[i]);
+	}
+
+	riinfo->fpmeta = fpmeta;
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Extract fields from a tuple into Datum/nulls arrays
  */
@@ -3112,8 +3544,11 @@ ri_CompareWithCast(Oid eq_opr, Oid typeid, Oid collid,
 /*
  * ri_HashCompareOp -
  *
- * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * Look up or create a cache entry for the given equality operator and
+ * the caller's value type (typeid).  The entry holds the operator's
+ * FmgrInfo and, if typeid doesn't match what the operator expects as
+ * its right-hand input, a cast function to coerce the value before
+ * comparison.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3169,8 +3604,14 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * pf_eq_oprs (used by the fast path) can be cross-type when the
+		 * FK and PK columns differ in type, e.g. int48eq for int4 PK /
+		 * int8 FK.  If the FK column's type already matches what the
+		 * operator expects as its right-hand input, no cast is needed.
+		 */
+		if (typeid == righttype)
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR:  insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+         2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+         1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+        1|         1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+         1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+        1|         1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR:  could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+         2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+         1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+        2|         1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
 test: fk-snapshot
 test: fk-snapshot-2
 test: fk-snapshot-3
+test: fk-concurrent-pk-upd
 test: subxid-overflow
 test: eval-plan-qual
 test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+  CREATE TABLE parent (
+    parent_key int PRIMARY KEY,
+    aux   text NOT NULL
+  );
+
+  CREATE TABLE child (
+    child_key int PRIMARY KEY,
+    parent_key int8 NOT NULL REFERENCES parent
+  );
+
+  INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+  DROP TABLE parent, child;
+}
+
+session s1
+step s1b  { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b  { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
 DROP TABLE FKTABLE;
 DROP TABLE PKTABLE;
 --
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR:  permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
 -- Check initial check upon ALTER TABLE
 --
 CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
 DROP TABLE FKTABLE;
 DROP TABLE PKTABLE;
 
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
 --
 -- Check initial check upon ALTER TABLE
 --
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..c51a0a903a6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -815,6 +815,7 @@ ExtensionInfo
 ExtensionLocation
 ExtensionSiblingCache
 ExtensionVersionInfo
+FastPathMeta
 FDWCollateState
 FD_SET
 FILE
-- 
2.47.3



view thread (63+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
  In-Reply-To: <CA+HiwqFV-PY-3BxM6j5TaAiC3AwedDxo-6vwRSbvygg3zF+xAQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox